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FOREWORD 


The Software Engineering Laboratory (SEL) is an organization sponsored by the National 
Aeronautics and Space Administration/Goddard Space Flight Center (NASA/GSFC) and created 
to investigate the effectiveness of software engineering technologies when applied to the 
development of application software. The SEL was created in 1976 and has three primary 
organizational members: 

NASA/GSFC, Software Engineering Branch 

University of Maryland, Department of Computer Science 

Computer Sciences Corporation, Software Engineering Operation 

The goals of the SEL are (1) to understand the software development process in the GSFC 
environment; (2) to measure the effect of various methodologies, tools, and models on this 
process; and (3) to identify and then to apply successful development practices. The activities, 
findings, and recommendations of the SEL are recorded in the Software Engineering Laboratory 
Series, a continuing series of reports that includes this document. 

Single copies of this document can be obtained by writing to 

Software Engineering Branch 
Code 552 

Goddard Space Flight Center 
Greenbelt, Maryland, U.S.A. 20771 
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SECTION 1— INTRODUCTION 


This document is a collection of selected technical papers produced by participants in the 
Software Engineering Laboratory (SEL) from September 1994 through November 1995. The 
purpose of the document is to make available, in one reference, some results of SEL research that 
originally appeared in a number of different forums. This is the 13 th such volume of technical 
papers produced by the SEL. Although these papers cover several topics related to software 
engineering, they do not encompass the entire scope of SEL activities and interests. Additional 
information about the SEL and its research efforts may be obtained from the sources listed in the 
bibliography at the end of this document. 

For the convenience of this presentation, the nine papers contained here are grouped into four 
major sections: 

• Software Engineering Laboratory (Section 2) 

• Software Models (Section 3) 

• Software Measurement (Section 4) 

• Technology Evaluations (Section 5) 

Section 2 includes several papers and articles that describe the SEL’s process improvement 
program and the Experience Factory and it’s relationship to other improvement approaches. 
Section 3 contains a case study that uses the Actor-Dependency Model to analyze and assess a 
large software maintenance organization. Section 4 includes four papers. The first describes a 
rigorous and disciplined approach to defining product metrics, and the second evaluates property- 
based metrics defined using this approach. The third paper in Section 4 gives a study that uses 
error data to better understand and evaluate an evolving reuse process, and the fourth paper 
presents an experimental investigation of a suite of object-oriented design metrics. Finally, 
Section 5 contains an experience report that describes using domain analysis to create a library of 
highly reusable components that are able to be configured within a standard architecture to 
produce low-cost systems. 

The SEL is actively working to understand and improve the software development process at the 
Goddard Space Flight Center (GSFC). Future efforts will be documented in additional volumes 
of the Collected Software Engineering Papers and other SEL publications. 
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SECTION 2— THE SOFTWARE ENGINEERING LABORATORY 


The technical papers included in this section were originally prepared as indicated below. 

• "SEL's Software Process-Improvement Program," V. Basili, M. Zelkowitz, F. McGarry, 
G. Page, S. Waligora, and R. Pajerski, IEEE Software, vol. 12, no. 6, November 1995, 
pp. 83-87 

• The Experience Factory Strategy and Practice, V. R. Basili and G. Caldiera, University 
of Maryland, Computer Science Technical Report, CS-TR-3483, UMIACS-TR-95-67, 
May 1995 

• "The Experience Factory and Its Relationship to Other Quality Approaches," V. R. 
Basili, Advances in Computers, vol. 41, Academic Press, Incorporated, 1995 
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SEL'S SOFTWARE 

PROCESS-IMPROVEMENT 

PROGRAM 


In 1993, the IEEE Computer Society and the Software Engineering Institute jointly estab- 
lished the Software Process Achievement Award to recognize outstanding improvement accom- 
plishments. This award is to be given anmially if suitable nominations are received by the SEI 
before November 1 each year. The nominations are reviewed by an award committee of Barry 
Boehm, Manny Lehman, Bill Riddle, myself, and Vic Basili (who did not participate in this 
award decision because of his involvement in the Software Engineering Laboratory). 

It is particularly fitting that the SEL was selected as the first winner for this award. They 
started their pioneering work nearly a decade before the Software Engineering Institute was 
founded, and their work has been both a guide and an inspiration to all of us who have attempt- 
ed to follow in their footsteps. 

— Watts Humphrey 


VICTOR BASILI 
and MARVIN ZELKOWITZ 
University of Maryland 

FRANK McGARRY, 
JERRY PAGE, 
and SHARON WAUGORA 
Computer Sciences Corporation 

ROSE PAJERSKI 
NASA Goddard Space 
Flight Center 


F or nearly 20 years, the j 
Software Engineering ; 
Laboratory has worked to j 
understand, assess, and j 
improve software and the 
software-development j 
process within the produc- j 
tion environment of the 
Flight Dynamics Division j 
of NASA’s Goddard Space 
Flight Center. We have 
conducted experiments on 
about 125 FDD projects, 
applying, measuring, and 
analyzing numerous soft- 
ware-process changes. As a 
result, the SEL has adopt- 
ed and tailored processes 
— based on FDD goals 
and experience — to sig- 
nificantly improve software 
production. 

The SEL is a coopera- 
tive effort of NASA/ 
Goddard’s FDD, the Univ- 
ersity of Maryland Depart- 
ment of Computer Science, 
and Computer Sciences 
Corporation’s Flight Dyna- 
mics Technology Group. It j 
was established in 1976 


with the goal of reducing 
♦ the defect rate of 
delivered software, 


♦ the cost of software to 
support flight projects, and 

♦ the average time to 
produce mission-support 
software. 

Our work has yielded an 
extensive set of empirical 
studies that has guided the 
evolution of standards, man- 
agement practices, technolo- 
gies, and training within the 
organization. The result has 
been a 75 percent reduction 
in defects, a 50 percent reduc- 
tion in cost, and a 25 percent 
reduction in cycle time. Over 
time, the goals of SEL have 
matured. We now strive to: 

♦ Understand baseline 
processes and product 
characteristics, such as cost, 
reliability, software size, 
reuse levels, and error 
classes. By characterizing a 
production environment, 
we can gain better insight 
into the software process 
and its products. 

♦ Assess improvements 
that have been incorporat- 
ed into development pro- 
jects. By measuring the 
impact of available tech- 
nologies on the software 


j process, we can determine 
which technologies are 
beneficial to the environ- 
i ment and — most impor- 
I tantly — how the technolo- 
j gies should be refined to 
1 best match the process with 
j the environment. 

I ♦ Package and infuse 
! improvements into the 
j standard SEL process and 
I update and refine stan- 
! dards, handbooks, training 
! materials, and develop- 
| ment-support tools. 1 " 3 By 
j identifying process im- 
provements, we can pack- 
age the technology so it can 
be applied in the produc- 
tion environment. 


As Figure 1 shows, these 
i goals are pursued in a 
sequential, iterative process 


i 


that has been formalized by 
Basili as the Quality Im- 
provement Paradigm 4 and 
its use within the SEL for- 
malized as the Experience 
Factory. 5 


j IMPROVING THE PROCESS 


We select candidates 
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Figure 1. The SEL goals are pursued in a sequential, iterative fashion. The diagram includes 
some of the many SEL studies that have been conducted over the years, including those of 
Cleanroom, Ada, and Fortran. f 


for process change on the 
basis of quantified SEL expe- 
riences (such as the most sig- 
nificant causes of errors) and 
clearly defined goals for the 
software (such as to decrease 
error rates). After we select 
the changes, we provide 
training and formulate exper- 
iment plans. We then apply 
the new process to one or 
more production projects and 
take detailed measurements. 
We assess a process’s success 
by comparing these measures 
with the continually evolving 
baseline. Based upon the 
results of the analysis, we 
adopt, discard, or revise the 
process. 

Process improvement 
applies to individual projects, 
experiments (the observation 
of two or three projects), as 
well as the overall organiza- 
tion (the observation of 
trends over many years). In 


; the early years, the SEL i 
I emphasized building a clear j 
i understanding of the process I 
j and products within the envi- j 
i ronment. This led us to j 
j develop models, relations, j 
i and general characteristics of : 
j the SEL environment. Most ; 
; of our process changes con- j 
: sisted of studying specific, ; 
j focused techniques (such as 
program-description lan- ; 
i guage, structure charts, and 
j reading techniques), but the 
| major enhancement was the 
I infusion of measurement, 
process-improvement con- 
cepts, and die realization of \ 
the significance of process in j 
j the software culture. 

! | 

I SEL OPERATIONS 

The SEL has collected | 
and archived data on more 
than 125 of its software- 


development projects. We 
use the data to build typical- 
project profiles against 
which we compare and eval- 
uate ongoing projects. The 
SEL provides its managers 
with tools for monitoring 
and assessing project status. 
The FDD typically runs six 
to 10 projects simultaneous- 
ly, each of which is consid- 
ered an experiment within 
the SEL. 

For each project, we col- 
lect a basic set of informa- 
tion (such as effort and 
error data). From there, the 
data we collect may vary 
according to the experiment 
or be modified as changes 
are made to specific 
processes (such as the use of 
Ada). As the information is 
collected, it is validated and 
placed in a central database. 
We then use this data with 
other information — such 


as the subjective lessons 
learned — to analyze the 
impact of a specific software 
process and to measure and 
feed back results to both 
ongoing and follow-on pro- 
jects. 

We also use the data to 
build predictive models and 
to provide a rationale for 
refining current software 
processes. As we analyze the 
data, we generate papers 
and reports that reflect the 
results of numerous studies. 
We also package the results 
as standards, policies, train- 
ing materials, and manage- 
ment tools. 


PROCESS AND PRODUCT 
ANALYSIS 

The FDD is responsible 
for the development and 
maintenance of flight- 
dynamics ground-support 
software for all Goddard 
flight projects. Typical 
FDD projects range in size 
from 100,000 to 300,000 
lines of code. Several pro- 
jects exceed a million lines 
of code; others are as small 
as 10,000 lines of code. (At 
SEL, reused code is not 
“free”; it is counted as 20 
percent of new Fortran code 
and 30 percent of new Ada 
code.) The SEL improve- 
ment goal is to demonstrate 
continual improvement of 
the software process within 
the FDD environment by 
carrying out analysis, mea- 
surement, and feedback to 
projects within this environ- 
ment. 

Understanding. Under- 
standing what an organiza- 
: tion does and how it oper- 
ates is fundamental to any 
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Figure 2. Effort distribution by (A) life-cycle phase and (B) 
activity. Phase data counts hours charged to a project during 
each calendar phase. Activity data counts hours attributed to a 
particular activity (as reported by the programmer ), regard- 
less ofvihen in the life cycle the activity occurred. 



Figure 3. Results of three completed Cleanroom projects , 
compared against the SEL baseline. 


attempt to plan, manage, or 
improve the software 
process. This is especially 
true for software-develop- 
ment organizations. The 
SEL supports this under- 
standing in several ways, 
including, for example, the 
study of effort distribution 
and error-detection rate. 

♦ Effort distribution 
identifies which phases of 
the life cycle consume which 
portion of development 
effort. Figure 2 presents the 
effort distribution of 11 
Fortran projects by life-cycle 
phase and activity. Under- 
standing these distributions 
helps us plan new efforts, 
evaluate new technologies, 
and assess the similarities 
and differences within an 
ongoing project. 

♦ Error-detection rate 
provides the absolute error 
rate expected in each phase. 
At SEL, we collected infor- 
mation on software errors 
and built a model of the 
expected errors in each life- 
cycle phase. For 1,000 lines 
of code, we found about 
four errors during imple- 
mentation; two during sys- 
tem test; one during accep- 
tance test; and one-half dur- 
ing operation and mainte- 
nance. The trend we derive 
from this model is that 
error detection rates fell by 
50 percent in each subse- 
quent phase. This pattern 
seems to be independent of 
the actual error rates; it is 
true even in recent projects, 
in which the overall error, 
rates are declining. We use 
this model of error rates, as 
well as other similar types 
of models, to better predict, 
manage, and assess change 
on newly developed pro- 
jects. 


Assessing and refining. We 

consider each SEL project 
to be an experiment, in 
which we study some soft- 
ware method in detail. 
Generally, the subject of the 
study is a specific modifica- 
tion to the standard process 
— a process that obviously 
comprises numerous soft- 
ware methods. 

For example, the Clean- 
room software methodo- 
logy 5 has been applied on 
four projects within the 
SEL, three of which have 
been analyzed thus far. 
Each project gave us addi- 
tional insight into the 
Cleanroom process and 
helped us refine the method 
for use in the FDD envi- 
ronment. After training 
teams in the Cleanroom 
methodology, we defined a 
modified set of Cleanroom- 
specific data to be collected. 
The teams studied the pro- 
jects to assess the impact 
that Cleanroom had on the 
process, as well as on mea- 
sures such as productivity 
and reliability. Figure 3 
shows the results of the 
three analyzed projects. 

The Cleanroom experi- 
ments required significant 
changes to the standard 
SEL development method- 
ology and thus extensive 
training, preparation, and 
careful study execution. As 
in all such experiments, we 
generated detailed experi- 
mentation plans that 
described the goals, the 
questions that had to be 
addressed, and the metrics 
that had to be collected to 
answer the questions. 
Because Cleanroom consists 
of many specific methods — 
such as box-structure de- 
sign, statistical testing, and 


rigorous inspections — each 
particular method had to be 
analyzed, along with the 
Cleanroom methodology 
itself. As a result of these 
projects, a slightly modified 
Cleanroom approach was 
deemed beneficial for small- 
er SEL projects. Anecdotal 
evidence from the recently 
completed fourth Clean- 
room project confirms the 
effectiveness of Cleanroom. 
The revised Cleanroom- 
process model was captured 
in a process handbook for 
future applications to SEL 
projects. We have analyzed 
and applied many other 


methodologies in this way. 

Packaging. Once we have 
identified beneficial meth- 
ods and technologies, we 
provide feedback for future 
projects by capturing the 
process in standards, tools, 
and training. The SEL has 
produced a set of standards 
for its own use that reflect 
the results of its studies. 
Such standards must con- 
tinually evolve to capture 
modified characteristics of 
the process (the SEL typi- 
cally updates its basic stan- 
dard every five years.) 
Standards we have pro- 
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TABLE 1 

EARLY SEL BASELINE 


Project 

(number & name} 

Reuse 

(percent) 

Mission Cost" 
(staff months) 

Reliability 

(error/KSLOC) 

1. GROAGSS 

; 14.:'-- 

381 

4.42 

2. COBEAGSS 

12 

348 

5.22 

3. GOESAGSS 

12 

261 

5.18 

4. UARSAGSS 

10 

675 

2.81 

5.GROSIM 

18 " 

79 

■ 8.91 

6. COBSIM 

11 

39 

4.45 

7. GOESLM 

% 2? i ‘ ' 

; ; . 96 : 

“ 1.72 ‘ ' : 

8. UARSTELS 

35 

80 

2.96 


* Mission cost » cost of telemetry simulator + cart ofAGSS ( GRO « projects 1 + 5, 
COBE »2 + 6, GOES -5 + 7, UARS •4 + 8). 


TABLE 2 

CURRENT SEL BASELINE 


Project 

(number & name) 

Reuse 

(percent) 

Cost* 

(staff months) 

Reliability 

(error/KSLOC) 

1. EUVEAGSS 

: 18 

155 

1.22 

2.SAMPEX 

83 f 

77 

.76 

3. WINDPOLR 

18 

476 

n/af 

4. EUVETELS 

96 

36 

.41 

5. SAMPEXTS 

95 

21 

.48 

6. powits 

69 

77 

2.39 

7. TOMSTELS 

97 

n/a'f 

.23 

8. FASTELS 

92 

n/a^ 

.69 


* Mission cost - cost of telemetry simulator + cost ofAGSS (GRO » projects 1 + 5 , 
COBE - 2+6 , GOES - 5+7, U4£S -4 + 5;. 
f Excluded because it used the Cleartroom development methodology, vbich counts err or s 
differently. 

+ Total mission cost far TOMS and FAST cannot be calculated because AGSSs are 
incomplete (they are not included in the cost baseline). 


duced include: 

♦ Manager’s Handbook for 
Software Development , 1 

♦ Recommended Approach 
to Software Development } and 

♦ The SEL Relations and 
Models } 

In addition to the evolv- 
ing development standards, 
policies, and training mater- 
ial, successful packaging 
includes generating experi- 


ment results in the form of 
post-development analysis, 
formal papers, and guide- 
books for applying specific 
software techniques. 

IMPACT OF SEL 

Our studies have invol- 
ved many technologies, 
ranging from development 


and management practices 
to automation aids and 
technologies that affect the 
full life cycle. We have col- 
lected and archived detailed 
information so we can assess 
the impact of technologies 
on both the software 
process and product. 

Product impact. To deter- 
mine the effect of sustained 
SEL efforts as measured 
against our major goals, we 
routinely compare groups 
of projects developed at dif- 
ferent times. Projects are 
grouped on the basis of 
size, mission complexity, 
mission characteristics, lan- 
guage, and platform. On 
these characteristic pro- 
jects, we compared defect 
rates, cost, schedule, and 
levels of reuse. The reuse 
levels were studied carefully 
with the full expectation 
that there would be a corre- 
lation between higher reuse 
and lower cost and defect 
rates. These characteristic 
projects become our “base- 
lines." Table 1 shows an 
early baseline — eight pro- 
jects completed between 
1985 and 1989. These pro- 
jects were all ground-based 
attitude-determination and 
-simulation systems ranging 
in size from 50,000 to 
150,000 lines of code that 
were developed on large 
IBM mainframes. Each was 
also a success, meeting mis- 
sion dates and requirements 
within acceptable cost. 
Table 2 shows the current 
SEL baseline, which com- 
prises seven similar projects 
completed between 1990 
and 1994. 

As the tables show, the 
early baseline projects had a 
reliability rate that ranged 


from 1.7 to 8.9 errors per 
1,000 lines of code, with an 
average rate of 4.5 errors. 
The current baseline pro- 
jects had a reliability rate 
ranging from 0.2 to 2.4 
errors per 1,000 lines of 
code, with an average rate 
of 1 error. This is about a 
75 -percent reduction over 
the eight-year period. 

The dramatic increase in 
our reuse levels — aided by 
experimentation with tech- 
niques such as object-ori- 
ented development and 
domain-engineering con- 
cepts — have been a major 
contributor to improved 
project cost and quality. 
Reuse, along with increased 
productivity, also con- 
tributed to a significant 
decrease in project cost. We 
examined selected missions 
from the two baselines and 
found that, although the 
total lines of code per mis- 
sion remained relatively 
equal, the total mission cost 
decreased significantly. The 
average mission cost in the 
early baseline ranged from 
357 to 755 staff-months, 
with an average of 490. The 
current baseline projects 
had costs ranging from 98 
to 277 staff-months with an 
average of 210. This is a 
decrease in average cost per 
mission of more than 50 
percent over the eight-year 
period. This reduction 
occurred despite the 
increased mission complexi- 
ty, shown in Table 3. 

Process impact. The most 
significant changes in the 
SEL environment are illus- 
trated by the standards, 
training programs, and 
development approaches 
incorporated into the FDD 
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TABLE 3 

COMPARING INCREASE 
IN BASELINE COMPLEXITY 


Attribute 

Early SEL baseline 

Current SEL baseline 

Control 

Spin stabilized 

Three-axis stabilized 

Sensors 

1 

8 to 11 

Torques 

1 

2 to 3 

Onboard 

Analog 

Digital 

computer 

simple control 

control 

Telemetry 

5 

12 to 15 

Dam rates 

2.2 kbs 

32 kbs 

Accuracy 

1 degree 

0.02 degree 


process. Although specific 
techniques and methods 
have had a measurable 
impact on a class of pro- 
jects, significant improve- 
ment to the software-devel- 
opment process — and an 
overall change in the envi- 
ronment — has occurred 
because we have continu- 
ously incorporated detailed 
techniques into higher level 
organizational processes. 

The most significant 
process attributes that dis- 
tinguish our current pro- 
duction environment from 
that of a decade earlier 
include: 

♦ Process change and 
improvement has been 
infused as a standard business 
practice. All standards and 
training material now con- 
tain elements of our continu- 
ous-improvement approach 
to experimentation. 

♦ Measurement is now 
our way of doing business 
rather than an add-on to 
development. Measurement 
is as much a part of our 
software standards as docu- 
mentation. It is expected, 
applied, and effective. 

♦ Change is driven by 
process and product. As the 
process-improvement pro- 
gram matured over the 
years, our concern for prod- 
uct attributes grew to equal 
our concern for process 
attributes. Product goals are 
always defined before 
process change is infused. 
Measures of product are 
thus as important as those 
of process (if not more so). 

♦ Change is bottom-up. 
Although process-improve- 
ment analysts originally 
assumed they could work 
independently of develop- 


ers, we have realized over 
the years that change must 
be guided by development- 
project experience. Direct 
input from developers as 
well as measures extracted 
from development activities 
are key factors in change. 

♦ “People-oriented” 
technologies are empha- 
sized, rather than automa- 
tion. The most effective 
process changes are those 
that leverage the thinking of 
developers. These include 
reviews, inspections, Clean- 
room techniques, manage- 
ment practices, and inde- 
pendent-testing techniques 
— all of which are driven by 
disciplined programmers 
and managers. Automation 
techniques have sometimes 
provided improvement, but 
people-driven approaches 
have had farther reaching 
impacts. 

T he SEL has invested 
approximately 11 per- 
cent of its total software 
budget into process-im- 
provement. This expense 
includes project overhead, 
as well as overhead for data 
archiving and processing 
and process and product 
analysis. We have main- 
tained detailed records so 
we can accurately record 
and report process-improve- 
ment costs. 

Our investment in 
process-improvement has 
brought many benefits. The 
cost, defect rates, and cycle 
time of flight-dynamics 
software have decreased sig- 
nificantly since we started 
the program. Today, our 
software developers are 
building better software 


more efficiently — using 
many techniques and meth- 
ods considered experimen- 
tal only a few years ago. 
Their progress has been 
facilitated throughout by 
the SEL focus on defining 
organizational goals, ex- 
panding domain under- 
standing, and judiciously 
applying new technology, 
allowing the FDD to maxi- 
mize the lessons from local 
experience. ♦ 
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ABSTRACT 

The quality movement, that has had in recent years a dramatic impact on all industrial sectors, has 
recently reached the systems and software industry. Although some concepts of quality management, 
originally developed for other product types, can he applied to software, its specificity as a product which 
is developed and not produced requires a special approach. This paper introduces a quality paradigm 
specifically tailored on the problems of the systems and software industry. 

Reuse of products, processes and experience originating from the system life cycle is seen today as a 
feasible solution to the problem of developing higher quality systems at a lower cost. In fact, quality 
improvement is very often achieved by defining arid developing an appropriate set of strategic capabilities 
and core competencies to support them. A strategic capability is, in this context, a corporate goal defined 
by the business position of the organization and implemented by key business processes. Strategic 
capabilities are supported by core competencies, which are aggregate technologies tailored to the specific 
needs of the organization in performing the needed business processes. Core competencies are non- 
transitional, have a consistent evolution, and are typically fueled by multiple technologies. Their selection 
and development requires commitment, investment and le a de rship. 

The paradigm introduced in this paper for developing core competencies is the Quality Improvement 
Paradigm which consists of six steps: 

1. Characterize the environment 4. Execute the process 

2. Set the goals 5. Analyze the process data 

3. Choose the process 6. Package experience 

The process must be supported by a goal-oriented approach to measurement and control, and an 
organizational infrastructure, called Experience Factory. The Experience Factory is a logical and physical 
v organization distinct from the project organizations it supports. Its goal is development and support of 
core competencies through capitalization and reuse of lift cycle experience and products. 

The paper introduces the major concepts of the proposed approach, discusses their relationship with other 
approaches used in the industry, arid presents a case in which those concepts have been successfully 
applied. 


This work was supported by NASA Grant NSG-5123 and by Hughes Applied 
Information Systems, Inc. 
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1. INTRODUCTION 


The presence of software in almost every activity and institution is a 
characteristic of our society. Our dependence on software becomes evident when 
software problems and related events make the headlines of newspapers. 
However, this dependency on software, although highly visible, is not yet well 
understood by the business community. Software is still too often perceived as 
the easiest part of a system, the part that can be easily modified and adapted to 
fit to the main business of the organization. 

This idea that "software is easy" or, ultimately, "cheap" is hard to eradicate, even 
when there is substantial evidence that it is not true anymore. In particular, there 
is a certain difficulty in dealing with software quality, both it terms of definition 
(What is quality software?) and implementation of quality programs (How can 
we produce quality software?). 

The starting point of every discussion on software quality is the recognition that 
software is an industrial product whose quality can be managed in a similar way 
to the quality of other products or services. A software system is the result of the 
concurrent effort of teams of people working according to a traditional 
engineering paradigm (a conception phase followed by an implementation 
phase, very often with several iterations). In fact, we call "software engineering" 
the systematic approach to the development, operation and maintenance of 
software systems (and associated documentation and data). 

As with every industrial product, the quality of software is defined as "fitness 
for use" over its lifetime. Therefore, the goal of a quality management program is 
to incorporate quality into a software system in the most economically 
convenient way, i.e., by designing a high quality system. The challenge of 
software quality is to implement techniques and programs in order to fill the 
existing gap between demand and our ability to produce high-quality software 
in a cost-effective way. 

The software product, however, presents the following critical combination of 
characteristics: 

• Software is a logical aggregate of invisible parts : The quality of such 
aggregate depends on the appropriateness of the logical 
structuring of the parts and on a precise and easy-to-understand 
documentation of this structure; 

• Software is designed for user applications which are expected to evolve 
continuously : The quality of application software depends on the 
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precise conceptual understanding of user needs, and on the 
adaptability of design to a changing environment; good 
communication between designers and users, and user perception 
are essential components of good software design; 

• Software is developed and not produced: Each software product is like 
a prototype, therefore many statistical concepts that help us in 
measuring and controlling quality in industrial products do not 
apply completely to software products; 

• Software is a human based technology: The quality of the software 
product is dependent on the individuals involved, therefore 
appropriate use of individual skills, individual satisfaction and 
motivation are key issues in achieving substantial improvements in 
quality and productivity. 

We believe that the quality of a software system should and can be managed in 
two ways. First, the effectiveness of the software development process should be 
improved by reducing the amount of rework and reusing software artifacts 
across segments of a project or different projects. Second, plans for controlled, 
sustained, and continuous improvement should be developed and implemented 
based on facts and data. 

But software engineering does not make extensive use of quantitative data. 
Therefore software quality management is based on a very immature and 
unstable paradigm. A major problem is that many data regarding the quality of 
a system can only be observed, and measured when the system is implemented. 
Unfortunately, at that stage the correction of a design defect requires the 
redesign of some, sometimes large and complex, components and is very 
expensive. In order to prevent the occurrence of expensive defects in the final 
product, quality management must focus on the early stages of the engineering 
process, in particular on the requirements analysis and design phases, and use 
quantitative data in order to record and support inspection and decision making. 
Those early stages are, however, the ones in which the process is less defined 
and controllable with quantitative data. Therefore, software engineering projects 
do not regularly collect data and build models based upon them. 

There are many software project that can be considered successful from a quality 
point of view; generally this means that the techniques and procedures applied 
in the project have been effective, in particular those aimed at assuring quality. 
The goal of quality management is to make this success repeatable in other 
projects, by transferring the knowledge and the experience that are at the roots 
of that success to the rest of the organization. Therefore, a software organization 
that manages quality should have, besides the quality assurance infrastructure 
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associated with each project, a corporate infrastructure that links together and 
transcends the single projects by capitalizing on successes and learning from 
failures. 

Quality management and infrastructure, however, do not just happen; they must 
be planned and implemented by the organization through specific programs and 
investments. This paper is about the need for a strategic approach to software 
quality management, as a part of a corporate strategy for software, aimed at 
pursuing and improving quality as an organization and not as a group of 
individual projects. 

We will motivate the need for such an approach, discuss it in the context of some 
of the most relevant concepts developed by the management disciplines, and 
provide a framework for a solution, which has been applied in practice with 
convincing results. 

We believe there is no solution that can be mechanically transferred and applied 
to every organization (the famous "silver bullet"), and this applies also to the 
concepts presented in this paper. The proposed approach, however, can be used 
by every organization, after appropriate customization, in order to improve 
software quality in a controllable way. 
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2. THE PROBLEM OF SOFTWARE QUALITY 


Quality is the totality of characteristics of a product or service "that bear on its 
ability to satisfy stated or implied needs" [ISOl]. It is a multidimensional concept 
that includes the entity of interest (title product or service), the viewpoint on that 
entity (the user, the producer, a regulatory agency, etc.) and the quality 
attributes of that entity (the characteristics that make it fit for use). A recent 
international standards [IS03] identifies the following characteristics: 

• Functionality • Efficiency 

• Reliability • Maintainability 

• Usability • Portability 

In some cases, such as regulated environments in which some safety critical 
factors must be determined (aeronautics, nuclear power, etc.), these attributes 
are specified by a standard or a contract; but in the majority of cases they are 
identified and defined during the design process, and modified throughout the 
life cycle of the system. The ability of an organization to identify and define the 
quality attributes that are closer to the "stated or implied needs" of a user is the 
critical success factor in the market of the 90's. 


Figure 1 
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Today the success of a software organization is measured by its 
cost/performance attributes: it delivers (or updates) the needed systems 
generally on time and without budget overruns. In the longer run, though, if we 
take into account today's market, characterized by shrinking budgets and 
increased global competition, we can expect, for the second half of the ’90s, that 
the most successful organizations will probably be the ones that have been able 
to converge to better levels of productivity and quality. The influence of 
international standards such as the ISO 9000 Series [B02] is already evident. 
Many organizations are now seeking registration and the ability to develop 
quality systems in compliance with the requirements of the standard. 
Registration, however, is a means and not an end: spending resources on 
developing a quality system without a quality improvement program that uses it 
to gain a competitive advantage would be a waste of money. This is why, along 
with ISO 9000 registration programs, we see quality improvement programs 
being started. We can expect that in a few years all this movement will lead to a 
higher quality baseline for all the software that is being purchased and 
developed around the world. On top of this baseline the organizations will be 
able to build their own quality management programs and their continuous 
improvement strategies. In this way quality will complete its transformation 
from problem (search for defects) to tool (defined processes) to business 
opportunity used to distinguish an organization from its competitors (Figure 1). 

At that point, the real advantage will come from the ability of the software 
organization to deliver solutions that not only satisfy, but also anticipate die 
needs of the system users, enhancing their business and adding a substantial 
amount of value to their products and services [Hamel and Prahalad, 1991]. 
Competition in the '90s is a more complex and dynamic playing field, in which 
the basic factors for success are the understanding of trends and the response to 
changing needs. The traditional rigidity of software organizations must to be 
adapted to the new ground rules. New professional skills, beyond the traditional 
programmer/analyst/manager triangle, are necessary in order to capitalize on 
the experience of the organization and work on specific lines of business instead 
of developing isolated products. 

If we survey the approaches to software quality available to the industry, we see 
a variety of paradigms, mostly coming from the manufacturing industry. 

Some organizations apply to their software processes an improvement process 
based on the Shewart-Deming Cycle [Deming, 1986]. This approach provides a 
methodology for managing change throughout the steps of a production process 
by analyzing the impact of those changes on the data derived from the process. 
The methodology is articulated in four phases: 
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Plan; 


• Do: 

• Check: 


• Act 


Define quality improvement goals and targets and 
determine methods for reaching those goals; prepare 
an implementation plan. 

Execute the implementation plan and collect data. 

Verify die improved performance using the data 
collected from the process and take corrective actions 
when needed. 

Standardize the improvements and install them into 
the process. 


Some organizations use the Total Quality Management (TQM) approach, which 
is a derivative of the PDCA method applied to all business processes in the 
organization [Feigenbaum, 1991]. Actually, more than a specific method TQM is 
a family of management philosophies based on the fact that quality is measured 
by the user of a product, and that everyone in the organization has specific 
responsibilities for the quality of the final outcome. Therefore, in TQM 
programs, quality improvements, identified during a preliminary 
characterization effort, are usually experimented by pilot groups and then 
institutionalized across the whole organization. The TQM approach usually 
results in the establishment of cross-functional quality improvement teams 
chartered to addressing specific quality improvements within a strategic quality 
plan developed by the top management. 

A different approach is adopted by organizations that model their improvement 
on an external scale that is meant to represent the best practices in quality. The 
goals of the improvement program are, in this case, not internally generated but 
suggested by those best practices. A model of this kind, which is today very 
popular in both the USA and Europe, is the SEI Capability Maturity Model [SEI; 
Bootstrap] which measures the maturity of a software organization on the basis 
of its dependence on individual skills and on the presence of certain 
technologies. In a low maturity organization, the success of a task depends on 
the efforts of people involved in it, professionals and managers. Their ability to 
control risk, to solve or even prevent problems is the major asset of the 
organization. In a more mature organization, the success is based on the use of 
sound managerial and engineering techniques coordinated by a pervasive, well- 
defined set of processes for the execution of the needed tasks. At the highest 
level of maturity, the organization effectively capitalizes on its experiences and 
improves its processes. The improvement is achieved by bringing the 
organization through these levels of maturity. 

All these approaches, and variations on them, have been used by the software 
industry, with mixed outcomes. Some outstanding successes have been reported, 
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such as the one shown in Figure 2 [Dion, 1993], by combining those approaches. 
The major problem with all these approaches is that they either do not deal 
specifically with the nature of the software product (Deming Cycle, TQM) or, if 
they do, they assume that there is a consistent picture of what a good software 
product or process is (SEI model). 

We argue that this is not enough for two reasons: die first one is that in order to 
be really effective a software quality program should deal with the nature of the 
software business itself; the second is that there is really no such thing as an 
explicit consistent picture of a good software product. 


Figure 2 


Raytheon Experience 
Costs 

• $ 1 million: Investment on the improvement program for each year 
(1987-1992) 

Benefits 

• $ 15.8 million: Rework costs eliminated 
Return on investment 

• 7.7 : 1 in 1990 

Changes in % project time by cost type from 1988 to 1990 

• Performance: Cost of building it right the first time, from 34% to 55%; 

• Non conformance: Cost of rework, from 44% to 18%; 

• Appraisal: Cost of testing, from 15% to 15%; 

• Prevention: Cost of preventing non-conformance, from 7% to 12%. 


On one hand, if we look at processes and technologies in isolation, like in the 
Plan/Do/ Check/ Act and TQM approaches, we have very little chance to get to 
the right level of abstraction that provides reusable units across different 
processes. Those approaches do not really build "model abstractions" because 
they manipulate the process explicitly. For instance: if we apply TQM to the 
order entry process, we have well defined elementary actions performed to enter 
an order. We can describe them with a flow chart and analyze the process, apply 
changes and assess their impact. We will have very soon many instances of that 
process to build a control chart and bring it under control. Unfortunately, the 
same approach cannot be used on a software process (e.g., structured design), 
which cannot be reduced to elementary units and is not replicated many times in 
a short period. 

On the other hand, if we base our judgment upon an external model, like in the 
SEI and similar approaches, we might loose characteristics that make an 
organization's environment "special." Those characteristics are, in many cases, at 
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the roots of the competitive advantage of that organization, therefore their loss is 
very damaging for the improvement program. 

The approach that will be presented in the next sections of this paper is an 
attempt to learn from the successes obtained through the different paradigms 
sketched in this section, and to avoid the problems encountered in their 
application to software environments. It rests on the lean enterprise concept 
[Womack, 1989] by concentrating production and resources on value-added 
activities that represent the critical business processes of the organization. Such 
processes, after having been recognized, are conceptually redesigned in a 
modular way and associated with models, data, techniques and tools, in order to 
reuse than according to the needs and characteristics of specific projects. Total 
quality management [Feigenbaum, 1991] and Concurrent engineering [Dewan 
and Riedl, 1993] can be used in order to keep the structure efficient, responsive 
to the needs of any external entity (customer or supplier), and to make it rest 
upon partnership and participation, with many feedbacks and measures of the 
effectiveness of communication. 
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3. TOWARDS A MATURE SOFTWARE ORGANIZATION 


If we analyze carefully some of the most successful and trend-setting business 
stories of die last 10 years [Stalk, Evans and Shulman, 1992J, we can ascribe the 
reported successes to the application of four basic principles: 

1. Business processes are the building blocks of the corporate 
strategy. 

2. Competitive success depends on understanding and transforming 
the key business processes into strategic capabilities. 

3. Strategic capabilities are created by sustained long-term 
investments in a support infrastructure that links together and 
transcends the business units. 

4. A capability-based strategy must be sponsored by the top 
management of the corporation. 

It is important to understand these four principles in the context of on a software 
organization. 

The first principle sets the focus on business processes: this is consistent with the 
current tendency to emphasize the role of software processes in a successful 
project. Software is a logical aggregation and an intellectual product, which is, 
therefore, strongly dependent on the processes executed for developing or 
maintaining it. The analysis of those processes and the ability to reuse them in 
the appropriate context are a key competitive factor for every software 
organization. The corporate strategy must focus on identification and 
characterization of the key business processes used in developing and 
maintaining software, so that the business units, relieved from process related 
concerns, can focus more on the individual systems and services that are 
developed and delivered to individual clients. 

The second principle is about "strategic understanding" of business processes. 
This means that the organization must understand its key business processes 
sufficiently to transform them into reusable units available to all its business 
units where needed. Not every process used in the organization has the 
characteristics of criticality that make it worthy of being transformed into a 
strategic capability: it is only from the analysis of the relationship between 
software processes and the mission of file organization that we can obtain a 
strategic level of understanding and a consolidated hypothesis of what should 
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become a strategic capability. A system developer or integrator, for instance, 
produces software in order to deliver services to a particular group of users (e.g., 
electronic messaging). In this case a good cost/benefit ratio for the system or 
service is probably the most crucial issue. Therefore, the process of making 
acceptable estimates and to develop a plan based on diem has a criticality 
definitely higher than the process of assuring the highest possible reliability. On 
the other hand, for a manufacturer of systems dependant on software (e.g., 
cellular phones) the cost/benefit ratio for software is distributed over a large 
number of products and therefore not extremely crucial for the single software 
package. Therefore, the process of assuring reliability has a higher criticality in 
comparison with the ability of making acceptable estimates of software costs. 

The third and the fourth principles call for long-term investments and top 
management sponsorship, which translates into a permanent structure that 
develops and supports the reuse of the strategic capabilities. This is particularly 
new for the software industry, which is, in its large majority, driven by its 
business units and, therefore, has little ability to capitalize on experiences and 
capabilities. The required permanent structure is designed to provide a double 
support cycle: 

• Control cyde: Support is provided to the everyday operation of 
software projects by comparing their current performance with the 
normal performance of similar projects; 

• Capi taliz ation cyde: Support is provided to future projects by 
continually learning from past experience and packaging this 
experience in a reusable way. 

The development of strategic capabilities and competendes to support them, 
which is the key to all four of the presented prindples, has, in the case of 
software, some basic requirements: 

1. The organization must understand the software process and 
product. 

2. The organization must define its business needs and its concept of 
process and product quality. 

3. The organization must evaluate every aspect of the business 
process, induding previous successes and failures. 

4. The organization must collect and use information for project 
control. 
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5. Each project should provide information that allows the 
organization to have a formal quality improvement program in 
place, i.e. the organization should be able to control its processes, 
to tailor them to individual project needs and learn from its own 
experiences. 

6. Competencies must be built in critical areas of the business by 
packaging and reusing clusters of experience relevant to the 
organization's business. 

Part of the problem with the software business is the lack of understanding of 
the nature of software and software development. To some extent, software is 
different from most products. First of all, software is developed in die creative, 
intellectual sense, rather than produced in the manufacturing sense, i.e., each 
software system is developed rather than manufactured. Second, there is a non- 
visible nature to software. Unlike an automobile or a television set, it is hard to 
see the structure or the function of software, or to reason about it in a 
straightforward way. Therefore, the development of strategic capabilities in 
software requires understanding, model building and continuous feedback from 
the process. 

This means that we must rethink the software business and expand our focus to 
a new set of problems and the techniques needed to solve them. Unfortunately, 
the traditional orientation of a software project is based on a case-by-case 
problem solving attitude; the development of strategic capabilities is based, 
instead, on an experience reuse and organizational sharing attitude. Figure 3 
outlines the traditional focus of software development and problem solving, 
along with the expanded focus, proposed here for experience reuse. 

The obvious question to be asked now is: are there any practical models that can 
be used in order to develop a strategy with the new focus? Such practical models 
can be software organizations that have tried to implement a capability-based 
strategy (or at least parts of it) and have carefully collected lessons learned and 
data, empirical studies in-the-large based on the scientific method (observe, 
formulate a hypothesis, measure and analyze, validate/refute die hypothesis) 
that have published their findings in a workable form, controlled experiments 
in-the-small. 
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Figure 3 


Traditional Focus 

New Extended Focus 

• 

• Delivering specific products 
and services 

• Developing capabilities 

• Decomposing a complex 
problem into simpler ones 

• Unifying different solutions into 
more general ones 

• Design/implementation 
process 

• Analysis/Synthesis process 

• Instantiation 

• Generalization and formalization 

• Validation and verification 

• Experimentation 


In Section 5 we will illustrate an experience that we, together with large part of 
the software engineering community, consider a practical model. The reason for 
choosing this one, besides the personal involvement of the authors of this paper 
with it, which provides us with considerable insight, is its almost unique blend 
of an organizational strategy aimed at continuous improvement, of a data-based 
approach to decision making, of an experimental paradigm, along with many 
years of continuous operation and data collection. 
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4 A STRATEGY FOR IMPROVEMENT 


This section will present a strategy for improvement based on the development 
of strategic capabilities. 

The main concept of this strategy is the central role played by a methodological 
framework addressing the development and improvement of strategic 
capabilities in form of reusable experience. This framework will be presented 
and discussed in the form of a process called "Quality Improvement Paradigm” 
[Basili, 1985]. In order to manage tins conceptual framework we will need two 
tools 

• A control tool: The goal-oriented approach to measurement 
addressing the issue of supporting the improvement process with 
quantitative information [Basili and Weiss, 1984]; 

• An organizational tool: An infrastructure aimed at capitalization 
and reuse of software experience and products [Basili, 1989]. 

In the next section we will see the methodological framework and die associated 
tools at work in a specific and practical example. 


4.1 THE QUALITY IMPROVEMENT PARADIGM 


A strategic capability is for us a corporate goal defined by the business position of 
the organization and implemented by key business processes. Strategic 
capabilities of software organizations are identified by the analysis of die 
categories of products/services that the organization intends to deliver in the 
future, of the level of project control needed in order to deliver those 
products/services at the appropriate level of quality, and of die strengths and 
weaknesses of the organization. Examples of strategic capabilities are 
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• Certify the reliability of the system that is being released for. 
acceptance by the customer; 

• Have a design-to-cost process, i.e., tailor the design of a software 
system to the amount of available resources (money, people, 
computers, etc.); 

• Use flexible standards, i.e. standards that can, case by case, be 
tailored to the needs and the characteristics of each project; 

• Have a short cycle-time, i.e., reduce the elapsed time from the 
identification of a solution to its deployment. 

Strategic capabilities are always supported by core competencies, which are 
aggregate technologies tailored to die specific needs of the organization in 
performing the needed business processes. For instance: in order to certify the 
reliability of a system, an organization needs to master the quality assurance 
process owning competencies such as statistical testing and reliability modeling; 
in order to design to cost the organization must use flexible processes owning 
competencies such as process modeling and control, and concurrent engineering. 

Core competencies have characteristics that distinguish them from simple 
technologies or clusters of technologies: 

• They are non-transitional: although sometimes they appear to be 
fashionable concepts, they don't come and go; 

• They have a consistent evolution: a paradigm for their interpretation 
and application is built over time and some consensus is generated 
throughout the user community; 

• They require commitment, investment and leadership; 

• They are typically fueled by and work with multiple technologies; 

• They generally support multiple product/ service lines. 

The acquisition of core competencies that support the strategic capabilities is the 
goal of the process we will present in this section. If a competency is a key factor 
in a strategic capability, the organization must be sure to own, control and 
properly maintain this competency at state-of-the-art level, and know how to 
tailor it to the characteristics of specific projects and business units. 
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Strategic capabilities come into the improvement process as constituents of 
characteristics and goals. On the basis of the characteristics of the environment 
and of the transformation of those capabilities into specific goals for die software 
organization, the improvement paradigm provides a disciplined way to build 
the competencies necessary to support those capabilities. 

The improvement process is articulated into the following six steps (Figure 4): 

1. Characterize: Understand the environment based upon available 
models, data, intuition, etc. Establish baselines with the existing 
business processes in the organization and characterize their 
criticality. 

2. Set Goals: On the basis of the initial characterization and of the 
capabilities that have a strategic relevance to the organization, set 
quantifiable goals for successful project and organization 
performance and improvement. The reasonable expectations are 
defined based upon die baseline provided by the characterization 
step. 


Figure 4 



3. Choose Process: On the basis of the characterization of the 
environment and of the goals that have been set, choose the 
appropriate processes for improvement, and supporting methods 
and tools, making sure that they are consistent with the goals that 
have been set. 

4. Execute: Perform the processes constructing the products and 
providing project feedback based upon the data on goal 
achievement that are being collected. The processes will be 
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executed according to the needs dictated by the problem and to the 
process chosen in the previous phase. 

5. Analyze: At the end of tine execution, analyze the data and the 
information gathered to evaluate the current practices, determine 
problems, record findings, and make recommendations for future 
project improvements. 

6. Package: Consolidate tine experience gained in the form of new, or 
updated and refined, models and other forms of structured 
knowledge gained from this and prior projects, and store it in an 
experience base so it is available for future projects. 

The Quality Improvement Paradigm implements the two major cycles, control 
and capitalization, introduced in section 3: 

• The project feedback cycle (control cycle) is the feedback that is 
provided to the project during the execution phase: whatever the 
goals of tine organization, the project should use its resources in the 
best possible way; therefore quantitative indicators at project and 
task level are useful in order to prevent and solve problems, 
monitor and support the project, realign the process with the goals; 

• The corporate feedback cycle (capitalization cycle) is the feedback 
that is provided to the organization and has the purpose of 

. Providing analytical information about project 
performance at project completion time by comparing 
the project data with the nominal range in the 
organization and analyzing concordance and 
discrepancy; 

. Understanding what happened, capturing experience 
and devising ways to transfer that experience across 
domains; 

. Accumulating reusable experience in the form of 
software artifacts that are applicable to other projects 
and are, in general, improved based on the 
performed analysis. 

The execution of the quality improvement paradigm by an organization is 
structured as an iterative process that repeatedly characterizes the environment, 
sets appropriate goals and chooses the process in order to achieve those goals. 
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then proceeds with the execution and die analytical phases. At each iteration 
characteristics and goals are redefined and improved (Figure 5). 


Figure 5 



The reader has probably realized at this point that there is a deep similarity 
between the QIP and the Total Quality Management (TQM) philosophy. Figure 6 
outlines some other correspondences between the two models. 

The relationship between the QIP and the Plan/Do/Check/Act cycle is even 
closer. Both approaches are an offspring of the modem scientific method: first an 
hypothesis is generated, then an experiment is planned in order to validate the 
hypothesis, data are collected and analyzed, and the hypothesis is evaluated. 
The concept of feedback is also critical to both approaches: during the execution 
of the processes that have been planned and at the end of the execution data are 
analyzed in order to understand the impact of the changes introduced into the 
process. The real major difference between the two approaches appears at the 
end of the cycle: the PDCA approach incorporates the changes into the normal 
operation of the process, while the QIP develops a series of models that reflect 
the changes. This is due, as we said before, to the relatively smaller number of 
process instances that we have in the case of a software process, when compared 
with a manufacturing process. 
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Figure 6 
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• Focuses on customer satisfaction 

and partnership for quality 

and partnership for quality 

• Customers are both external and 

• Capitalizes on project 
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achievements 

• Customers are both external and 
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software process and product 

• Bases decision making on facts 

• Bases decision making on facts 
and data collected across 
different projects 


4.2 THE GOAL-ORIENTED MEASUREMENT 

The Goal/Question/Metric Approach [Basili and Weiss, 1984; Basili and 
Rombach, 1988] provides a method to identify and control key business 
processes in a measurable way. It is used to define metrics over the software 
project, process and product in such a way that the resulting metrics are tailored 
to the organization and to its goals, and reflect the quality values of the different 
viewpoints (developers, users, operators, etc.). 

The result of the application of the Goal/Question/Metric Approach is the 
specification of a measurement system targeting a particular set of issues and a 
set of rules for the interpretation of the measurement data. The resulting 
measurement model has three levels: 

Conceptual level (GOAL): A goal is defined for an object, for a 
variety of reasons, with respect to various models of quality, from 
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various points of view, relative to a particular environment. Objects 
of measurement include 

• Products: Artifacts, deliverables and documents that 
are produced during the system life cycle; E.g., 
specifications, designs, programs, test suites. 

• Processes: Software related activities normally 

associated with time; E.g., specifying, designing, 
testing, interviewing. 

• Resources: Items used by processes in order to 
produce their outputs; E.g., personnel, hardware, 
software, office space. 

• Knowledge objects: Models of the behavior of other 
items derived from past observations; E.g., resource 
models, reliability models. 

2. Operational level (QUESTION): A set of questions is used to define 
in a quantitative way the goal and to characterize the way the 
specific goal is going to be interpreted based on some 
characterizing model. Questions try to characterize the object of 
measurement (product, process, resource, knowledge object) with 
respect to a selected quality issue and to determine its quality from 
the selected viewpoint. 

3. Quantitative level (METRIC): A set of data is associated with every 
question in order to answer it in a quantitative way. 


Figure 7 



A GQM model is a hierarchical structure (Figure 7) starting with a goal 
(specifying purpose of measurement, object to be measured, issue to be 
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measured, and viewpoint from which the measure is taken). In order to give an 
example of application of the Goal/ Question/Metric approach, let's suppose we 
want to improve the timeliness of change request processing during the 
maintenance phase of the life-cycle of a system. The resulting goal will specify a 
purpose (improve), a process (change request processing), a viewpoint (project 
manager), and a quality issue (timeliness) (Figure 8). The goal is refined into 
several questions that usually break down the issue into its major components. 
The goal of the example can be refined to a series of questions, about, for 
instance, turn-around time and resources used. Each question is then refined 
into metrics. The questions of our example can, for instance, be answered by 
metrics comparing specific turn-around times with the average ones. The same 
metric can be used to answer different questions under the same goal. Several 
GQM models can also have questions and metrics in common, making sure that, 
when the measure is actually taken, the different viewpoints are taken into 
account correctly (i.e., the metric might have different values when taken from 
different viewpoints). The Goal/Question/Metric Model of our example is 
shown in Figure 8. 


Figure 8 


Goal Purpose 

Issue 

Object (process) 
Viewpoint 

Improve 

the timeliness of 

change request processing 

from the project manager’s viewpoint 

Question 

Is the performance of the process improving? 

Metrics 

Current average turnaround time 
Baseline average turnaround time 

Subjective rating of manager's satisfaction 

Question 

Is the distribution of resources changing? 

Metrics 

Percent effort spent on problem analysis 
Percent effort spent on solution identification 
Percent effort spent on solution implementation 
Percent effort spent on solution testing | 


2-29 


SEL-95-003 



In conclusion, we can also use the Goal/Question/Metric Approach for long, 
range corporate goal setting and evaluation. The evaluation of a project can be 
enhanced by analyzing it in the context of several other projects. We can expand 
our level of feedback and understanding by defining the appropriate synthesis 
procedure for transforming specific, valuable information into more general 
packages of experience. As a part of the Quality Improvement Paradigm, we can 
learn more about the definition and application of the Goal/Question/Metric 
Approach in a formal way, just as we would learn about any other experiences. 


4.3 EXPERIENCE FACTORY: THE CAPABILITY-BASED ORGANIZATION 


The concept of the Experience Factory [Basili, 1989] has been introduced in order 
to institutionalize the collective learning of die organization that is at the root of 
continuous improvement and competitive advantage. 

Reuse of experience and collective learning cannot be left to the imagination of 
single, very talented, managers: in a capability-based organization they become 
a corporate concern like the portfolio of businesses or the company assets. The 
experience factory is the organization that sitpports reuse of experience and collective 
learning by developing, updating and providing upon request clusters of competencies to 
the project organizations . We call these clusters of competencies, experience 
packages. The project organizations supply the experience factory with their 
products, the plans, processes and models used in their development, and the 
data gathered during development and operation; the experience factory 
transforms them into reusable units and supplies them to the project 
organizations, together with specific support made of monitoring and 
consulting. 

The experience factory organization can be a logical and/or physical 
organization, but it is important that its activities are clearly identified and made 
independent from those of the project organization. 

As we have seen at die beginning of this paper, the packaging of experience is 
based on tenets and techniques that are different from the problem solving 
activity used in project development. Therefore the projects and the factory will 
have different process models: each project will choose its process model based 
upon the characteristics of die software product that will be delivered, while the 
experience factory will define (and change) its process model based upon the 
nature of the work, and organizational and performance issues. 
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Figure 9 provides a high-level picture of the experience factory organization and 
highlights activities and information flows among the component sub- 
organizations. 

The project organization, whose goal is to produce and maintain software, 
provides the experience factory with project and environment characteristics, 
development data, resource usage information, quality records, and process 
information. This provides feedback on the actual performance of the models 
processed by the experience factory and utilized by the project. 

The experience factory provides direct feedback to each project, together with 
goals and models tailored from similar projects. It also produces and provides 
upon request baselines, tools, lessons learned, and data, parametrized in some 
form in order to be adapted to the specific characteristics of a project. The 
support personnel sustain and facilitate the interaction between developers and 
analysts, by saving and maintaining the information, making it efficiently 
retrievable, and controlling and monitoring the access to it. 


Figure 9 
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The main product of the experience factory is a set of core competencies 
packaged as aggregates of technologies. Figure 10 shows some examples of core 
competencies and the corresponding aggregation of technologies: 

Core competencies can be implemented in a variety of formats. We call these 
formats "experience packages". Their content and structure vary based upon the 
kind of experience clustered in it. There is, generally, a central element that 
determines what the package is: a software life cycle product or process, a 
mathematical relationship, an empirical or theoretical model, a data base, etc. 
We can use this central element as identifier of the experience package and 
produce a taxonomy of experience packages based upon the characteristics of 
this central element; e.g.: 

• Product packages: Programs, Architectures, Designs; 

Figure 10 



• Tool packages: Constructive and Analytic Tools; 

• Process packages: Process Models, Methods; 

• Relationship packages: Cost and Defect Models, Resource Models, 
etc.; 

• Management packages: Guidelines, Decision Support Models; 


2-32 


SEL-95-003 




Data packages: Defined and validated data. Standardized data, etc. 


The operation of the two components is based on the Quality Improvement 
Paradigm introduced in the previous section. Each component performs 
activities in all six steps, but for each step one component has a leadership role. 

In the first three phases (Characterize, Set Goals, and Choose Process) the focus 
of the operation is on planning, therefore the project organization has a leading 
role and is supported by the analysts of the experience factory. The outcome of 
these three phases is, on the project organization side, a project plan associated 
with a management control framework, and on the experience factory side a 
support plan also associated with a management control framework. The project 
plan describes the phases and the activities of the project, with their products, 
mutual dependencies, milestones and resources. As far as the experience factory 
side is concerned, the plan describes the support that the experience factory will 
provide for each phase and activity, also with products, mutual dependencies, 
milestones and resources. The two parts of the plan are obviously integrated 
although executed by different components. The management control 
frameworks are composed of data (metrics) and models for monitoring the 
execution of the plan. 

In the fourth phase (Execute) the focus of the operation is on delivering the 
product or service assigned to the project organization, therefore the project 
organization has again a leading role, and is supported by the experience 
factory. The outcome of this phase is the product or service, which represent a 
set of potentially reusable products, processes, and experiences. 

In the fifth and the sixth phases (Analyze and Package) the focus of the operation 
is on capturing project experience and making it available to future similar 
projects, therefore the experience factory has a leading role and is supported by 
the project organization that is the repository of that experience. The outcomes of 
these phases are lessons learned with recommendations for future 
improvements, and new or updated experience packages incorporating the 
experience gained during the project execution. 

Structuring a software development organization as an experience factory offers 
the ability to learn from every project, constantly increase the maturity of the 
organization and incorporate new technologies into the life cycle. In the long 
term, it supports the overall evolution of the organization from a project-based 
one, where all activities are aimed at the successful execution of current project 
tasks, to a capability-based one, which executes those tasks and capitalizes on 
their execution. 
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Some important benefits that an organization derives from structuring itself as 
an experience factory are 

• To establish an improvement process for software substantiated 
and controlled by quantitative data; 

• To produce a repository of software data and models which are 
empirically based on the everyday practice of the organization; 

• To develop an internal support organization that represents a 
limited overhead and provides substantial cost and quality 
performance benefits; 

• To provide a mechanism for identifying, assessing and 
incorporating into the process, new technologies that have proven 
to be valuable in similar contexts; 

• To incorporate reuse into the software development process and 
support it; 

• To approach in a more software specific way a Total Quality 
Management program. 

The concept of experience factory is an extension and a redefinition of the 
concept of software factory, as it has evolved from the original meaning of 
integrated environment to the one of flexible software manufacturing 
environment [Cusumano, 1991]. The major difference is that, while the software 
factory is thought of as an independent unit producing code by using an 
integrated development environment, the experience factory handles all kind of 
software-related experience. The software factory can be seen as a part of the 
experience factory, recognizing in this way that its potential benefits can be fully 
exploited only within this framework. 
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5. IMPROVEMENT IN PRACTICE: THE NASA SOFTWARE 
ENGINEERING LABORATORY 


In this section we will present and discuss a practical example of experience 
factory organization. We will show how its operation is based on the Quality 
Improvement Paradigm and we will use the case of a specific technology in 
order to illustrate the execution of the steps of the paradigm. 

The organization that provides the example is the Software Engineering 
Laboratory (SEL) at NASA Goddard Space Flight Center. The laboratory was 
established in 1976 as a cooperative effort among the Department of Computer 
Science of the University of Maryland, The National Aeronautic and Space 
Administration Goddard Space Flight Center (NASA/GSFC), and the Computer 
Sciences Corporation (CSC). The goal of the SEL was to understand and improve 
key software development processes and products within a specific 
organization, the Flight Dynamics Division. 

In general, the goals, the structure and the operation of the SEL have evolved 
from an initial stage, a laboratory dedicated to experimentation and 
measurement, to a full scale organization aimed at reusing experience and 
developing strategic capabilities. At the same time, the awareness of the quality 
improvement process used in the laboratory has generated the operational 
paradigm described in this paper as Quality Improvement Paradigm. Today the 
SEL represents a practical and operational example of experience factory [Basili 
et al., 1992]. 

The current structure of the SEL is based on three components: 

• Developers, who provide products, plans used in development, and 
data gathered during development and operation (the Project 
Organization); 

• Analysts, who transform these objects provided by the developers 
into reusable units and supply them back to the developers; they 
provide specific support to the projects on the use of the analyzed 
and synthesized information, tailoring it to a format which is 
usable by and useful to a current software effort (the Experience 
Factory proper); 
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• Support infrastructure, which provides services to the developers, on 
one hand, by supporting data collection and retrieval, and to the 
analysts, on the other hand, by managing the library of stored 
information and its catalogs (the Experience Base Support). 

The activities of these three sub-organizations, although not separated and 
independent from each other, have their own goal and process models and 
plans. Figure 11 outlines the difference in focus among the three sub- 
organizations. 


Figure 11 
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Figure 12 gives an idea of the overall size of the organization and of it 
components. 

We will now show the operation of the SEL following the development of a 
particular core competence through the six steps of the improvement paradigm. 
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Figure 12 
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In the late 80's the software engineering community, within and outside NASA, 
was discussing, among other technologies, the Ada programming language 
environment and technology [Ada, 1983]: the language had been developed 
under a major effort of the US Department of Defense and its application was 
being considered also in areas outside DoD. NASA was, at that time, considering 
the use of the Ada technology in some major projects such as the Space Station. 
More and more systems would have used Ada as development environment, 
and many organizations would have to be involved with it In consideration of 
this fact Ada had to be transformed from simple technology to core competence 
for the software development organizations within NASA. 

Associated with Ada there was the issue of object-oriented technologies. It is not 
very important for our discussion that our reader knows what is an object- 
oriented design technique. Anyway, Figure 13 provides some basic characteristic 
elements [Sommerville, 1992] of the object-oriented approach. 
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Figure 13 


Characteristics of the Object-Oriented Approach 

• A system is seen as a set of objects having at each 
time a specific state and behavior 

• Objects interact with each other by exchanging 
messages 

• Objects are organized into classes based on 
common characteristics and behaviors 

• All information about the state or the 
implementation of an object is held within the 
object itself and cannot be deliberately or 
accidentally used by other objects 


The Ada language environment implements several of those features and can be, 
to a certain extent, considered object-oriented. The design of systems to be 
implemented in Ada definitely takes advantage of the concepts of object- 
oriented design. Therefore, from the beginning, there was the impression in the 
SEL that the two technologies should be packaged together into a core 
competence supporting the strategic capability of delivering systems with better 
quality and lower delivery cost. After recognizing that this capability had a 
strategic value for the organization, the SEL selected Ada and the object-oriented 
design technology for supporting it, measured its benefits, and provided 
supporting data to the decision of using the technology. 

The process followed is illustrated in the following steps according to the QIP: 

1. Characterize : In 1985, the SEL had achieved a good understanding of how 
software was developed in the Flight Dynamics Division. The 
development processes had been defined and models had been built in 
order to improve the manageability of the process. The standard 
development methodology, based on the traditional design and build 
approach, had been integrated with concepts aimed at continuously 
evolving systems by successive enhancements. 

2. Set Cnals : Realizing that object-oriented techniques, implemented in the 
design and programming environments that support new languages, like 
C++ and Ada, offered potential for major improvements in the areas of 
productivity, quality and reusability of software products and processes, 
the SEL decided to develop a core competence around object-oriented 
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design and the use of the programming language Ada. The first step was 
to set up expectations and goals against which results would be 
measured. The SEL well-established baseline and set of measures 
provided an excellent basis for comparison. Expectations included 

• A change in the effort distribution of development activities: an 
increase of the effort on early phases, e.g., design, and a decrease of 
the effort on late phases, e.g., testing; 

• Increased reuse of software modules, both verbatim and with 
modification; 

' • Decreased maintenance costs due to the better quality of reusable 
components; 

• Increased reliability as a result of lower global error rates, fewer 
high-impact interface errors, and fewer design errors. 

Choose process : The SEL decided to approach the development of the 
desired core competence by experimenting with Ada and object-oriented 
design in a "real" project. Two version of the same system would be 
developed 

System A: To be developed using FORTRAN and following die 
standard methodology based on functional 
decomposition. This system will become operational 
and its development will follow the ordinary schedule 
constraints. 

System B: To be developed using Ada and following an object- 

oriented methodology called OOD. This system will 
not become operational. 

The data derived from the development of System B would be compared 
with those derived from the development of System A. Particular 
attention would be dedicated to quality and productivity data. The data 
collection and comparison would be based on the Goal Question Metric 
Model shown in Figure 14. 
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Figure 14 
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4. Execute : System A and B were implemented and the desired metrics were 
collected. During die development changes had to be applied to the 
approach that was used for using Ada and also adaptations had to be 
made in order to use OOD. For instance: some review procedures that 
were particularly suited for a design based on functional decomposition 
did not fit the approach used for System B. Therefore new review 
procedures were drafted for that development. 

5. Analyze : The data collected based on the previous GQM model showed 
an increase of the cost to develop (Metrics 1.1 and 1.2) that was 
interpreted as due on one hand to the inexperience of the organization 
with the new technology and on the other hand to the intrinsic 
characteristics of the technology itself. The data also showed an increase 
in the cost to deliver (Metrics 2.1 and 2.2) interpreted as due to the same 
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causes. The overall quality of System B showed an improvement over 
System A (Metrics 3.2 and 3.1) in terms of a substantially lower error 
density. Reuse data across systems (Metric 4.1) were obviously not 
available for System B because of die new implementation technology. 
The comparative data are shown in Figure 15. 


Figure 15 
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0.70 
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0.65 
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3.90 
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Reuse (%) 

30% 

N/A 


6. Package : The laboratory tailored and packaged an internal version of the 
methodology which adjusted and extended OOD for use in a specific 
environment and on a specific application domain. Commercial training 
courses, supplemented with limited project-specific training, constituted 
the early training in the techniques. The laboratory also produced 
experience reports containing the lessons learned using the new 
technology and recommending refinements to the methodology and the 
standards. 

The data collected from the first execution of the process were encouraging, 
especially on the quality issue, but not conclusive. Therefore new executions 
were decided and carried over in the following years. In conjunction with the 
development methodology, a programming language style guide was 
developed, that provided coding standards for the local Ada environment. At 
least 10 projects have been completed by the SEL using an object-oriented 
technology derived from the one used for System B, but constantly modified and 
improved. The size of single projects, measured in thousand lines of source code 
(KSLOC), ranges from small (38 KSLOC) to large (185 KSLOC). Some 
characteristics of an object-oriented development, using Ada, emerged early and 
have remained rather constant no significant change has been observed, for 
instance, in the effort distribution or in the error classification. Other 
characteristics emerged later and took time to stabilize: reuse has increased 
dramatically after the first projects, going from a traditionally constant figure of 
30% reuse across different projects, to a current 96% (89% verbatim reuse). 

Over the years the use of the object-oriented approach and the expertise with 
Ada have matured. Source code analysis of the systems developed with the new 
technology has revealed a maturing use of key features of Ada that have no 
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equivalent in the programming environments traditionally used at NASA. Such 
features were not only used more often in more recent systems, but they were 
also used in more sophisticated ways, as revealed by specific metrics used to this 
purpose. Moreover, the use of object-oriented design and Ada features has 
stabilized over the last 3 years, creating an SEL baseline for object-oriented 
developments. 

The charts shown in Figure 16 represent the trend of some si gn ificant indicators. 

The cost to develop code in the new environment has remained higher than the 
cost to develop code in the old one. However, because of the high reuse rates 
obtained through the object-oriented paradigm, the cost to deliver a system in 
the new’ environment has significantly decreased and lies now well below the 
old cost to deliver. 

The reliability of the systems developed in the new environment has improved 
over die years with the maturing of the technology. Although the error rates 
were significantly lower than die traditional ones, they have continued to 
decrease even further again, the high level of reuse in the later systems is a 
major contributor to this greatly improved reliability. 

Because of the stabilization of the technology and apparent benefit to the 
organization, the object-oriented development methodology has been packaged 
and incorporated into the current technology baseline and is a core competence 
of the organization. And this is where things stand today. 

Although the technology of object-oriented design will continue to be refined 
within the SEL, it has now progressed through all stages, moving from a 
candidate trial methodology to a fully integrated and packaged part of the 
standard methodology, ready for further incremental improvement. 

The example we have just shown illustrates also the relationship between a 
competence (object-oriented technology) and a target capability (deliver high 
quality at low cost), and shows how innovative technologies can enter the 
production cycle of mature organizations in a systematic way. Although the 
topic of technology transfer is not within the scope of this paper, it is clear from 
the SEL example that the model we derive from it outlines a solution to some 
major technology transfer issues. 
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Figure 16 
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The purpose of an experience factory organization is larger than technology 
transfer, it is capability transfer and reuse. If these capabilities are already 
consolidated into a technology, available within the organization or outside it, 
then the process is a process of technology transfer. If the capabilities are present 
in the organization as informal experience, products prepared for other 
purposes, and lessons learned, then the process is different from technology 
transfer. 
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6. CONCLUSIONS 


Clearly the nineties will be the quality era for software and there is a growing 
need to develop or adapt quality improvement approaches to the software 
business. Our approach to software quality improvement as it has been 
presented in this paper, is based on the exploitation and reuse of the critical 
capabilities of an organization across different projects based on business needs. 

The relationship between core competencies and strategic capabilities is 
established by the kind of products and services the organization wants to 
deliver and is specified by the strategic planning process. A possible mapping is 
shown as an example in Figure 17, in the case of an organization whose main 
business is development of systems and software for user applications. 


Figure 17 
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In this paper we have shown, through the NASA example, that all these ideas 
are practically feasible and have been successfully applied in a production 
environment in order to create a continuously improving organization. 

But what does "continuously improving organization" really mean? It is an 
organization that can manipulate its processes to achieve various product 
characteristics. This requires that the organization has a process and an 
organizational structure to 
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• Understand its processes and products; 

• Measure and model its business processes; 

• Define process and product quality explicitly, and tailor the 
definitions to the environment; 

• Understand the relationship between process and product quality; 

• Control project performance with respect to quality; 

• Evaluate project success and failure with respect to quality; 

• Learn from experience by repeating successes and avoiding 
failures. 

Using the Quality Improvement Paradigm /Experience Factory Organization 
approach the organization has a good chance to achieve all these capabilities, 
and to move up in the quality excellence scale fester, because it focuses on its 
strategic capabilities and value added activities. The Experience Factory 
Organization is the lean enterprise model for the system and software business. 
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Abstract 

This chapter describes the principles behind a specific set of integrated software 
quality improvement approaches which include the Quality Improvement Paradigm, 
an evolutionary and experimental improvement framework based on the scientific 
method and tailored for the software business, the Goal/Question/Metric Paradigm, a 
paradigm for establishing project and corporate goals and a mechanism for measuring 
against those goals, and the Experience Factoiy Organization, an organizational ap- 
proach for building software competencies and supplying them to projects on demand. 
It then compares these approaches to a set of approaches used in other businesses, such 
as the Plan-Do-Check- Act, Total Quality Management, Lean Enterprise Systems, and 
the Capability Maturity Model. 
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1. Introduction 

The concepts of quality improvement have permeated many businesses. It is clear 
that the nineties will be the quality era for software and there is a growing need to 
develop or adapt quality improvement approaches to the software business. Thus 
we must understand software as an artifact and software development as a business. 

Any successful business requires a combination of technical and managerial 
solutions. It requires that we understand the processes and products of the busi- 
ness, i.e., that we know the business. It requires that we define our business 
needs and the means to achieve them, i.e., we must define our process and product 
qualities. We need to define closed loop processes so that we can feed back 
information for project control. We need to evaluate every aspect of the business, 
so we must analyze our successes and failures. We must learn from our experi- 
ences, i.e., each project should provide information that allows us to do business 
better the next time. We must build competencies in our areas of business by 
packaging our successful experiences for reuse and then we must reuse our 
successful experiences or our competencies as the way we do business. 

Since the business we are dealing with is software, we must understand the 
nature of software and software development. Some of the most basic premises 
assumed in this work are that: 

The software discipline is evolutionary and experimental; it is a laboratory 
science. Thus we must experiment with techniques to see how and when they 
really work, to understand their limits, and to understand how to improve them. 

Software is development not production. We do not produce the same things 
over and over but rather each product is different from the last. Thus, unlike in 
production environments, we do not have lots of data points to provide us with 
reasonably accurate models for statistical quality control. 

The technologies of the discipline are human based. It does not matter how 
high we raise the level of discourse or the virtual machine, the development of 
solutions is still based on individual creativity and human ability will always 
create variations in the studies. 

There is a lack of models that allow us to reason about the process and the 
product. This is an artifact of several of the above observations. Since we have 
been unable to build reliable, mathematically tractable models, we have tended 
not to build any. And those that we have, we do not always understand in context. 

All software is not the same; process is a variable, goals are variable, content 
varies, etc. We have often made the simplifying assumption that software is 
software is software. But this is no more true that hardware is hardware is 
hardware. Building a satellite and a toaster are not the same thing, any more 
than building a microcode for a toaster and the flight dynamic software for the 
satellite are the same thing. 

Packaged, reusable, experiences require additional resources in the form of 
organization, processes, people, etc. The requirement that we build packages of 
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reusable experiences implies that we must learn by analyzing and synthesizing 
our experiences. These activities are not a byproduct of software development, 
they require their own set of processes and resources. 

2. Experience Factory/Quality improvement Paradigm 

The Experience Factory/Quality Improvement Paradigm (EF/QIP) (Basili, 
1985, 1989; Basili and Rombach, 1987, 1988) aims at addressing the issues of 
quality improvement in the software business by providing a mechanism for 
continuous improvement through the experimentation, packaging, and reuse of 
experiences based on a business’s needs. The approach has been evolving since 
1976 based on lessons learned in the National Aeronautics and Space Administra- 
tion/Goddard Space Flight Center (NASA/GSFC) Software Engineering Labora- 
tory (SEL) (Basili et ah, 1992). 

The basis for the approach is the QIP, which consists of six fundamental steps: 

Characterize the current project and its environment with respect to models 
and metrics. 

Set the quantifiable goals for successful project performance and improvement. 

Choose the appropriate process model and supporting methods and tools for 
this project. 

Execute the processes, construct the products, collect and validate the prescribed 
data, and analyze it to provide real-time feedback for corrective action. 

Analyze the data to evaluate the current practices, determine problems, record 
findings, and make recommendations for future project improvements. 

Package the experience in the form of updated and refined models and other 
forms of structured knowledge gained from this and prior projects and save 
it in an experience base to be reused on future projects. 

Although it is difficult to describe the QIP in great detail here, we will provide 
a little more insight into the preceding six steps here. 

Characterizing the Project and Environment. Based on a set of 
models of what we know about our business we need to classify the current 
project with respect to a variety of characteristics, distinguish the relevant project 
environment for the current project, and find the class of projects with similar 
characteristics and goals. This provides a context for goal definition, reusable 
experiences and objects, process selection, evaluation and comparison, and predic- 
tion. There are a large variety of project characteristics and environmental factors 
that need to be modeled and baselined. They include various people factors, 
such as the number of people, level of expertise, group organization, problem 
experience, process experience; problem factors, such as the application domain, 
newness to state of the art, susceptibility to change, problem constraints, etc.; 
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process factors, such as the life cycle model, methods, techniques, tools, program- 
ming language, and other notations; product factors, such as deliverables, system 
size, required qualities, e.g., reliability, portability, etc.; and resource factors, such 
as target and development machines, calendar time, budget, existing software, etc. 

Goal Setting and Measurement. We need to establish goals for the 
processes and products. These goals should be measurable, driven by models of 
the business. There are a variety of mechanisms for defining measurable goals: 
Quality Function Deployment Approach (QFD) (Kogure and Akao, 1983), the 
Goal/Question/Metric Paradigm (GQM) (Weiss and Basili, 1985), and Software 
Quality Metrics Approach (SQM) (McCall et al., 1977). 

We have used the GQM as the mechanism for defining, tracking, and evaluating 
the set of operational goals, using measurement. These goals may be defined for 
any object, for a variety of reasons, with respect to various models of quality, 
from various points of view, relative to a particular environment. For example, 
goals should be defined from a variety of points of view; user, customer, project 
manager, corporation, etc. 

A goal is defined by filling in a set of values for the various parameters in 
the template. Template parameters included purpose (what object and why), 
perspective (what aspect and who), and the environmental characteristics (where). 

Purpose: 

Analyze some 

(objects: process, products, other experience models) 
for the purpose of 

(why: characterization, evaluation, prediction, motivation, improvement) 

Perspective: 

With respect to 

(focus: cost, correctness, defect removal, changes, reliability, user friendli- 
ness, . . .) 

from the point of view of 

(who: user, customer, manager, developer, corporation, . . .) 

Environment: 

In the following context 

(problem factors, people factors, resource factors, process factors, . . .) 

Example: 

Analyze the (system testing method) for the purpose of (evaluation) with respect 
to a model of (defect removal effectiveness) from the point of view of the 
(developer) in the following context: the standard NASA/GSFC environment, 
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i.e., process model (SEL version of the waterfall model, . . .), application (ground 
support software for satellites), machine (running on a DEC 780 under VMS), etc. 

The goals are defined in an operational, tractable way by refining them into 
a set of quantifiable questions that are used to extract the appropriate information 
from the models of the object of interest and the focus. The questions and models 
define the metrics and the metrics, in turn, specify the data that needs to be 
collected. The models provide a framework for interpretation. 

Thus, the GQM is used to (1) specify the goals for the organization and the 
projects, (2) trace those goals to the data that are intended to define these goals 
operationally, and (3) provide a framework for interpreting the data to understand 
and evaluate the achievement of the goals, (4) and support the development of 
data models based on experience. 

Choosing the Execution Model. We need to be able to choose a generic 
process model appropriate to the specific context, environment, project character- 
istics, and goals established for the project at hand, as well as any goals established 
for the organization, e.g., experimentation with various processes or other experi- 
ence objects. This implies we need to understand under what conditions various 
processes are effective. All processes must be defined to be measurable and 
defined in terms of the goals they must satisfy. The concept of defining goals 
for processes will be made clearer in later chapters. 

Once we have chosen a particular process model, we must tailor it to the 
project and choose the specific integrated set of sub-processes, such as methods 
and techniques, appropriate for the project. In practice, the selection of processes 
is iterative with the redefinition of goals and even some environmental and project 
characteristics. It is important that the execution model resulting from these first 
three steps be integrated in terms of its context, goals, and processes. The real 
goal is to have a set of processes that will help the developer satisfy the goals 
set for the project in the given environment. This may sometimes require that 
we manipulate all three sets of variables to ensure this consistency. 

Executing the Processes. The development process must support the 
access and reuse packaged experience of all kinds. On the other hand, it needs 
to be supported by various types of analyses, some done in close to real time 
for feedback for corrective action. To support this analysis, data needs to be 
collected from the project. But this data collection must be integrated into the 
processes — it must not be an add on, e.g., defect classification forms part of 
configuration control mechanism. Processes must be defined to be measurable 
to begin with, e.g., design inspections can be defined so that we keep track of 
the various activities, the effort expended in those activities, such as peer reading, 
and the effects of those activities, such as the number and types of defects found. 
This allows us to measure such things as domain understanding (how well the 
process performer understands the object of study and the application domain) 
and assures that the processes are well defined and can evolve. 
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Support activities, such as data validation, education and training in the models, 
and metrics and data forms are also important. Automated support necessary to 
support mechanical tasks and deal with the large amounts of data and information 
needed for analysis. It should be noted, however, that most of the data cannot 
be automatically collected. This is because the more interesting and insightful 
data tends to require human response. 

The kinds of data collected include: resource data such as, effort by activity, 
phase, type of personnel, computer time, and calendar time; change and defect 
data, such as changes and defects by various classification schemes, process data 
such as process definition, process conformance, and domain understanding; 
product data such as product characteristics, both logical, e.g., application domain, 
function, and physical, e.g., size, structure, and use and context information, e.g., 
who will be using the product and how will they be using it so we can build 
operational profiles. 

Analyzing the Data. Based on the goals, we interpret the data that has been 
collected. We can use this data to characterize and understand, so we can answer 
questions like 4 4 What project characteristics effect the choice of processes, methods 
and techniques?” and “Which phase is typically the greatest source of errors?” 
We can use the data to evaluate and analyze to answer questions like 4 ‘What is the 
statement coverage of the acceptance test plan?’ ’ and 4 ‘Does the Cleanroom Process 
reduce the rework effort?” We can use the data to predict and control to answer 
questions like 4 ‘Given a set of project characteristics, what is the expected cost and 
reliability, based upon our history?” and “Given the specific characteristics of all 
the modules in the system, which modules are most likely to have defects so I can 
concentrate the reading or testing effort on them?’ ’ We can use the data to motivate 
and improve so we can answer questions such as “For what classes of errors is 
a particular technique most effective?” and “What are the best combination of 
approaches to use for a project with a continually evolving set of requirements based 
on our organization’s experience?” 

Packaging the Models. We need to define and refine models of all forms 
of experiences, e.g., resource models and baselines, change and defect baselines 
and models, product models and baselines, process definitions and models, method 
and technique evaluations, products and product parts, quality models, and lessons 
learned. These can appear in a variety of forms, e.g., we can have mathematical 
models, informal relationships, histograms, algorithms, and procedures, based 
on our experience with their application in similar projects, so they may be 
reused in future projects. Packaging also includes training, deployment, and 
institutionalization. 

The six steps of the QIP can be combined in various ways to provide different 
views into the activities. First note that there are two feedback loops, a project 
feedback loop that takes place in the execution phase and an organizational 
feedback loop that takes place after a project is completed. The organizational 
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learning loop changes the organization’s understanding of the world by the 
packaging of what was learned from the last project and as part of the characteriza- 
tion and baselining of the environment for the new project. It should be noted 
that there are numerous other loops visible at lower levels of instantiation, but 
these high-level loops are the most important from an organizational structure 
point of view. 

One high-level organizational view of the paradigm is that we must understand 
(characterize), assess (set goals, choose processes, execute processes, analyze 
data), and package (package experience). Another view is to plan for a project 
(characterize, set goals, choose processes), develop it (execute processes), and 
then learn from the experience (execute processes, analyze data). 

2.1 The Experience Factory Organization 

To support the Improvement Paradigm, an organizational structure called the 
Experience Factory Organization (EFO) was developed. It recognizes the fact 
that improving the software process and product requires the continual accumula- 
tion of evaluated experiences (learning), in a form that can be effectively under- 
stood and modified (experience models), stored in a repository of integrated 
experience models (experience base), that can be accessed or modified to meet 
the needs of the current project (reuse). 

Systematic learning requires support for recording, off-line generalizing, tailor- 
ing, formalizing, and synthesizing of experience. The off-line requirement is 
based on the fact that reuse requires separate resources to create reusable objects. 
Packaging and modeling useful experience requires a variety of models and formal 
notations that are tailorable, extendible, understandable, flexible, and accessible. 

An effective experience base must contain accessible and integrated set of 
models that capture the local experiences. Systematic reuse requires support for 
using existing experience and on-line generalizing or tailoring or candidate expe- 
rience. 

This combination of ingredients requires an organizational structure that sup- 
ports: a software evolution model that supports reuse, processes for learning, 
packaging, and storing experience, and the integration of these two functions. It 
requires separate logical or physical organizations with different focuses and 
priorities, process models, expertise requirements. 

We divide the functions into a Project Organization whose focus/priority is 
product delivery, supported by packaged reusable experiences, and an Experience 
Factory whose focus is to support project developments by analyzing and synthe- 
sizing all kinds of experience, acting as a repository for such experience, and 
supplying that experience to various projects on demand. 

The Experience Factory packages experience by building informal, formal or 
schematized, and productized models and measures of various software processes, 
products, and other forms of knowledge via people, documents, and automated 
support. 
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The Experience Factory deals with reuse of all kinds of knowledge and experi- 
ence. But what makes us think we can be successful with reuse this time, when 
we have not been so successful in the past. Part of the reason is that we are not 
talking about reuse of only code in isolation but about reuse of all kinds of 
experience and of the context for that experience. The Experience Factory recog- 
nizes and provides support for the fact that experience requires the appropriate 
context definition for to be reusable and it needs to be identified and analyzed 
for its reuse potential. It recognizes that experience cannot always be reused as 
is, that it needs to be tailored and packaged to make it easy to reuse. In the past, 
reuse of experience has been too informal, and has not been supported by the 
organization. It has to be fully incorporated into the development or maintenance 
process models. Another major issue is that a project’s focus is delivery, not 
reuse, i.e., reuse cannot be a by-product of software development. It requires a 
separate organization to support the packaging and reuse of local experience. 

The Experience Factory really represents a paradigm shift from current software 
development thinking. It separates the types of activities that need to be performed 
by assigning them to different organizations, recognizing that they truly represent 
different processes and focuses. Project personnel are primarily responsible for 
the planning and development activities — the Project Organization (Fig. 1) and 
a separate organization, the Experience Factory (Fig. 2) is primarily responsible 
for the learning and technology transfer activities. In the Project Organization, 
we are problem solving. The processes we perform to solve a problem consist 


EXPERIENCE 

PROJECT ORGANIZATION FACTORY 



Fig. 1 . The Project Organization. 
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Fig. 2. The Experience Factory. 


of the decomposition of a problem into simpler ones, instantiation of higher- 
level solutions into lower-level detail, the design and implementation of various 
solution processes, and activities such as validation and verification. In the Experi- 
ence Factory, we are understanding solutions and packaging experience for reuse. 
The processes we perform are the unification of different solutions and redefinition 
of the problem, generalization and formalization of solutions in order to abstract 
them and make them easy to access and modify, an analysis synthesis process 
enabling us to understand and abstract, and various experimentation activities so 
we can leam. These sets of activities are totally different. 


2.2 Examples of Packaged Experience in the SEL 

The SEL has been in existence since 1976 and is a consortium of three 
organizations: NASA/GSFC, the University of Maryland, and Computer Sciences 
Corporation (McGarry, 1985; Basili et al., 1992). Its goals have been to (1) 
understand the software process in a particular environment, (2) determine the 
impact of available technologies, and (3) infuse identified/refined methods back 
into the development process. The approach has been to identify technologies 
with potential, apply and extract detailed data in a production environment (experi- 
ments), and measure the impact (cost, reliability, quality, etc.). 

Over the years we have learned a great deal and have packaged all kinds of 
experience. We have built resource models and baselines, e.g., local cost models, 
resource allocation models; change and defect models and baselines, e.g., defect 
prediction models; types of defects expected for the application, product models, 
and baselines, e.g., actual vs. expected product size, library access; over time, pro- 
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cess definitions and models, e.g., process models for Cleanroom, Ada waterfall 
model; method and technique models and evaluations, e.g., best method for finding 
interface faults; products and product models, e.g., Ada generics for simulation of 
satellite orbits; a variety of quality models, e.g., reliability models, defect slippage 
models, ease of change models; and a library of lessons learned, e.g., risks associ- 
ated with an Ada development (Basili et al, 1992; Basili and Green, 1994). 

We have used a variety of forms for packaged experience. There are equations 
defining the relationship between variables, e.g., effort = 1 .48*KSLOC 9 \ number 
of runs = 108 + 150*KSLOCt; histograms or pie charts of raw or analyzed data, 
e.g., classes of faults: 30% data, 24% interface, 16% control, 15% initialization, 
15% computation; graphs defining ranges of “normal,” e.g., graphs of size 
growth over time with confidence levels; specific lessons learned associated with 
project types, phases, activities, e.g., reading by stepwise abstraction is most 
effective for finding interface faults; or in the form of risks or recommendations, 
e.g., definition of a unit for unit test in Ada needs to be carefully defined; and 
models or algorithms specifying the processes, methods, or techniques, e.g., an 
SADT diagram defining design inspections with the reading technique being a 
variable on the focus and reader perspective. 

Note that these packaged experiences are representative of software develop- 
ment in the Flight Dynamics Division at NASA/GSFC. They take into account 
the local characteristics and are tailored to that environment. Another organization 
might have different models or even different variables for their models and 
therefore could not simply use these models. This inability to just use someone 
else’s models is a result of all software not being the same. 

These models are used on new projects to help management control development 
( Valett, 1987) and provide the organization with a basis for improvement based on 
experimentation with new methods. It is an example of the EF/QIP in practice. 

2.3 In Summary 

How does the EF/QIP approach work in practice? You begin by getting a 
commitment. You then define the organizational structure and the associated 
processes. This means collecting data to establish baselines, e.g., defects and 
resources, that are process and product independent, and then measuring your 
strengths and weaknesses to provide a business focus and goals for improvement, 
and establishing product quality baselines. Using this information about your 
business, you select and experiment with methods and techniques to improve 
your processes based on your product quality needs and you then evaluate your 
improvement based on existing resource and defect baselines. You can define 
and tailor better and more measurable processes, based on the experience and 
knowledge gained within your own environment. You must measure for process 
conformance and domain understanding to make sure that your results are valid. 

t KSLOC is thousands of source lines of code. 
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In this way, you begin to understand the relationship between some process 
characteristics and product qualities and are able to manipulate some processes 
to achieve those product characteristics. As you change your processes you 
will establish new baselines and learn where the next place for improvement 
might be. 

The SEL experience is that the cost of the Experience Factory activities amounts 
to about 1 1% of the total software expenditures. The majority of this cost (approxi- 
mately 7%) has gone into analysis rather than data collection and archiving. 
However, the overall benefits have been measurable. Defect rates have decreased 
from an average of about 4.5 per KLOC to about 1 per KLOC. Cost per system 
has shrunk from an average of about 490 staff months to about 210 staff months 
and the amount of reuse has jumped from an average of about 20% to about 
79%. Thus, the cost of running an Experience Factory has more than paid for 
itself in the lowering of the cost to develop new systems, meanwhile achieving 
an improvement in the quality of those systems. 


3. A Comparison with Other Improvement Paradigms 

Aside from the Experience Factory/Quality Improvement Paradigm, there have 
been a variety of organizational frameworks proposed to improve quality for 
various businesses. The ones discussed here include: 

Plan-Do-Check-Act is a QIP based on a feedback cycle for optimizing a 
single process model or production line. Total Quality Management represents 
a management approach to long-term success through customer satisfaction based 
on the participation of all members of an organization. The SEI Capability 
Maturity Model is a staged process improvement based on assessment with regard 
to a set of key process areas until you reach level 5 which represents continuous 
process improvement. Lean ( software ) Development represents a principle sup- 
porting the concentration of the production on “value-added” activities and the 
elimination or reduction of “not- value-added” activities. In what follows, we 
will try to define these concepts in a little more detail to distinguish and compare 
them. We will focus only on the major drivers of each approach. 

3.1 Plan-Do-Check-Act Cycle (PDCA) 

The approach is based on work by Shewart (1931) and was made popular by 
Deming (1986). The goal of this approach is to optimize and improve a single 
process model/production line. It uses such techniques as feedback loops and 
statistical quality control to experiment with methods for improvement and build 
predictive models of the product. 

PLAN ►DO ►CHECK ►ACT ► I 
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If a family of processes (P) produces a family of products (X) then the approach 
yields a series of versions of product X (each meant to be an improvement of 
X ), produced by a series of modifications (improvements) to the processes P, 

Po, Pi, P 2 , ■ ■ • , P n Xo, Xj, X 2 , ... ,X n 

where P h represents an improvement over P,-/ and X, has better quality than X;_;. 

The basic procedure involves four basic steps: 

Plan : Develop a plan for effective improvement, e.g., quality measurement 
criteria are set up as targets and methods for achieving the quality criteria 
are established. 

Do: The plan is carried out, preferably on a small scale, i.e., the product is 
produced by complying with development standards and quality guidelines. 

Check: The effects of the plan are observed; at each stage of development, 
the product is checked against the individual quality criteria set up in the 
Plan phase. 

Act: The results are studied to determine what was learned and what can be 
predicted, e.g., corrective action is taken based upon problem reports. 

3.2 Total Quality Management (TQM) 

The term Total Quality Management (TQM) was coined by the Naval Air 
Systems Command in 1985 to describe its Japanese-style management approach 
to quality improvement (Feigenbaum, 1991). The goal of TQM is to generate 
institutional commitment to success through customer satisfaction. The ap- 
proaches to achieving TQM vary greatly in practice so to provide some basis 
for comparison, we offer the approach being applied at Hughes. Hughes uses 
such techniques as QFD, design of experiments (DOE), and statistical process 
control (SPC), to improve the product through the process. 


Identify — ► Identify Important -* 

► Make 

-► 

Hold - 

-► 

Provide 

needs 


items 

Improvements 

Gains 

Satisfaction 

Customer 


QFD 


DOE 


SPC 


Product 





_J 






The approach has similar characteristics to the PDCA approach. If Process 
(P) —* Product ( X ) then the approach yields 

Po, P„ P 2 , ■ • • , Pn -> Xo, X„ X 2 ,...,X n 

where P h represents an improvement over P t -i and X, provides better customer 
satisfaction than X,-;. 

In this approach, after identifying the needs of the customer, you use QFD to 
identify important items in the development of the system. DOE is employed to 
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make improvements and SPC is used to control the process and hold whatever 
gains have been made. This should then provide the specified satisfaction in the 
product based upon the customer needs. 

3.3 SE1 Capability Maturity Model (CMM) 

The approach is based upon organizational and quality management maturity 
models developed by Likert (1967) and Crosby (1980), respectively. A software 
maturity model was developed by Radice et al. (1985) while he was at IBM. It 
was made popular by Humphrey (1989) at the SEI. The goal of the approach is 
to achieve a level 5 maturity rating, i.e., continuous process improvement via 
defect prevention, technology innovation, and process change management. 

As part of the approach, a five-level process maturity model is defined (Fig. 
3). A maturity level is defined based on repeated assessment of an organization’s 
capability in key process areas (KPA). KPAs include such processes as Require- 
ments Management, Software Project Planning, Project Tracking and Oversight, 
Configuration Management, Quality Assurance, and Subcontractor Management. 
Improvement is achieved by action plans for processes that had a poor assess- 
ment result. 

Thus, if a Process (P) is level i then modify the process based upon the key 
processes of the model until the process model is at level i + 1. Different KPS As 
play a role at different levels. 

The SEI has developed a Process Improvement Cycle to support the movement 
through process levels. Basically it consists of the following activities: 

Initialize 

Establish sponsorship 
Create vision and strategy 
Establish improvement structure 

For each Maturity level: 

Characterize current practice in terms of KPAs 
Assessment recommendations 


Level 

Focus 


5 Optimizing 

Continuous Process Improvement 


4 Managed 

Product & Process Quality 

1 

3 Defined 

Engineering Process 

1 

2 Repeatable 

Project Management 

1 

1 Initial 

Heros 



Fig. 3. CMM maturity levels. 
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Revise strategy (generate action plans and prioritize KPAs) 

For each KPA: 

Establish process action teams 

Implement tactical plan, define processes, plan and execute pilot(s), plan 
and execute 
Institutionalize 

Document and analyze lessons 

Revise organizational approach 

3.4 Lean Enterprise Management 

The approach is based on a philosophy that has been used to improve factory 
output. Womack et al. ( 1 990), have written a book on the application of lean enter- 
prises in the automotive industry. The goal is to build software using the minimal 
set of activities needed, eliminating nonessential steps, i.e., tailoring the process to 
the product needs. The approach uses such concepts as technology management, 
human-centered management, decentralized organization, quality management, 
supplier and customer integration, and intemationalization/regionalization. 

Given the characteristics for product V, select the appropriate mix of sub- 
processes pi, qj, rk ... to satisfy the goals for V, yielding a minimal tailored 
process PV which is composed of pi, qj, rk . . . 

Process (PV) — > Product (V) 


3.5 Comparing the Approaches 

As stated above, the Quality Improvement Paradigm has evolved over 17 years 
based on lessons learned in the SEL (Basili, 1985, 1989; Basili and Rombach, 
1987, 1988; Basili et al., 1992). Its goal is to build a continually improving 
organization based upon its evolving goals and an assessment of its status relative 
to those goals. The approach uses internal assessment against the organizations 
own goals and status (rather than process areas) and such techniques as GQM, 
model building, and qualitative/quantitative analysis to improve the product 
through the process. 

Characterize-Set Goals-Choose Process-Execute-Analyze-Package 


t 

| Project 

| Corporate 

loop 


loop 


If Processes (P x , Qy, Rz, ...)—> Products (X, Y, Z, . . .) and we want to 
build V, then based on an understanding of the relationship between P x , Q Y , R z , 
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. . . and X,Y,Z, . . . and goals for V we select the appropriate mix of processes 
pi, qj, rk . . .to satisfy the goals for V, yielding a tailored 

Process (PV) — » Product (V) 

The EF/QIP is similar to the PDCA in that they are both based on the scientific 
method. They are both evolutionary paradigms, based on feedback loops from 
product to process. The process is improved via experiments; process modifica- 
tions are tried and evaluated and that is how learning takes place. 

The major differences are due to the fact that the PDCA paradigm is based 
on production, i.e., it attempts to optimize a single process model/production 
line, whereas the QIP is aimed at development. In development, we rarely replicate 
the same thing twice. In production, we can collect a sufficient set of data based 
upon continual repetition of the same process to develop quantitative models of 
the process that will allow us to evaluate and predict quite accurately the effects 
of the single process model. We can use statistical quality control approaches 
with small tolerances. This is difficult for development, i.e., we must learn form 
one process about another, so our models are less rigorous and more abstract. 
Development processes are also more human based. This again effects the build- 
ing, use, and accuracy of the types of models we can build. So although develop- 
ment models may be based on experimentation, the building of baselines and 
statistical sampling, the error estimates are typically high. 

The EF/QIP approach is compatible with TQM in that it can cover goals that 
are customer satisfaction driven and it is based on the philosophy that quality is 
everyone’s job. That is, everyone is part of the technology infusion process. Some- 
one can be on the project team on one project and on the experimenting team on 
another. All the project personnel play the major role in the feedback mechanism. 
If they are not using the technology right it can be because they don’t understand 
it, e.g., it wasn’t taught right, it doesn’t fit or interface with other project activities, 
it needs to be tailored, or it simply doesn’t work. You need the user to tell you how 
to change it. The EF/QIP philosophy is that no method is “packaged” that hasn’t 
been tried (applied, analyzed, tailored). The fact that it is based upon evolution, 
measurement, and experimentation is consistent with TQM. 

The differences between EF/QIP and TQM are based on the fact that the 
QIP offers specific steps and model types and is defined specifically for the 
software domain. 

The EF/QIP approach is most similar to the concepts of Lean Enterprise 
Management in that they are both based upon the scientific method/PDCA philos- 
ophy. They both use feedback loops from product to process and learn from 
experiments. More specifically, they are both based upon the ideas of tailoring 
a set of processes to meet particular problem/product under development. The 
goal is to generate an optimum set of processes, based upon models of the 
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business and our experience about the relationship between process characteristics 
and product characteristics. 

The major differences are once again based upon the fact that LEM was 
developed for production rather than development and so model building is based 
on continual repetition of the same process. Thus, one can gather sufficient data 
to develop accurate models for statistical quality control. Since the EF/QIP is 
based on development and the processes are human based, we must learn from 
the application of one set of processes in a particular environment about another 
set of processes in different environment. So the model building is more difficult, 
the models are less accurate, and we have to be cautious in the application of 
the models. This learning across projects or products also requires two major 
feedback loops, rather than one. In production, one is sufficient because the 
process being changed on the product line is the same one that is being packaged 
for all other products. In the EF/QIP, the project feedback loop is used to help 
fix the process for the particular project under development and it is with the 
corporate feedback loop that we must learn by analysis and syntheses across 
different product developments. 

The EF/QIP organization is different from the SEI CMM approach, in that the 
latter is really more an assessment approach rather than an improvement approach. 

In the EF/QIP approach, you pull yourself up from the top rather than pushing 
up from the bottom. At step 1 you start with a level 5 style organization even 
though you do not yet have level 5 process capabilities. That is, you are driven 
by an understanding of your business, your product and process problems, your 
business goals, your experience with methods, etc. You learn from your business, 
not from an external model of process. You make process improvements based 
upon an understanding of the relationship between process and product in your 
organization. Technology infusion is motivated by the local problems, so people 
are more willing to try something new. 

But what does a level 5 organization really mean? It is an organization that 
can manipulate process to achieve various product characteristics. This requires 
that we have a process and an organizational structure to help us: understand 
our processes and products, measure and model the project and the organization, 
define and tailor process and product qualities explicitly, understand the relation- 
ship between process and product qualities, feed back information for project 
control, experiment with methods and techniques, evaluate our successes and 
failures, learn from our experiences, package successful experiences, and reuse 
successful experiences. This is compatible with the EF/QIP organization. 

QIP is not incompatible with the SEI CMM model in that you can still use 
key process assessments to evaluate where you stand (along with your internal 
goals, needs, etc.). However, using the EF/QEP, the chances are that you will 
move up the maturity scale faster. You will have more experience early on 
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operating within an improvement organization structure, and you can demonstrate 
product improvement benefits early. 

4. Conclusion 

Important characteristics of the EF/QIP process indicate the fact that it is 
iterative; you should converge over time so don’t be overly concerned with 
perfecting any step on the first pass. However, the better your initial guess at 
the baselines the quicker you will converge. 

No method is “packaged” that hasn’t been tried (applied, analyzed, tailored). 
Everyone is part of the technology infusion process. Someone can be on the 
project team on one project and on the experimenting team on another. Project 
personnel play the major role in the feedback mechanism. We need to learn from 
them about the effective use of technology. If they are not using the technology 
right it can be because they don’t understand it or it wasn’t taught right, it doesn’t 
fit/interface with other project activities, it needs to be tailored, or it doesn’t 
work and you need the user to tell you how to change it. Technology infusion 
is motivated by the local problems, so people are more willing to try something 
new. In addition, it is important to evaluate process conformance and domain 
understanding or you have very little basis for understanding and assessment. 

The integration of the Improvement Paradigm, the Goal/Question/Metric Para- 
digm, and the EFO provides a framework for software engineering development, 
maintenance, and research. It takes advantage of the experimental nature of 
software engineering. Based upon our experience in the SEL and other organiza- 
tions, it helps us understand how software is built and where the problems are, 
define and formalize effective models of process and product, evaluate the process 
and the product in the right context, predict and control process and product 
qualities, package and reuse successful experiences, and feed back experience 
to current and future projects. It can be applied today and evolve with technology. 

The approach provides a framework for defining quality operationally relative 
to the project and the organization, justification for selecting and tailoring the 
appropriate methods and tools for the project and the organization, a mechanism 
for evaluating the quality of the process and the product relative to the specific 
project goals, and a mechanism for improving the organization’s ability to develop 
quality systems productively. The approach is being adopted by several organiza- 
tions to varying degrees, such as Motorola and HP, but it is not a simple solution 
and it requires long-term commitment by top-level management. 

In summary, the QIP approach provides for a separation of concerns and 
focus in differentiating between problem solving and experience modeling and 
packaging. It offers a support for learning and reuse and a means of formalizing 
and integrating management and development technologies. It allows for the 
generation of a tangible corporate asset: an experience base of software competen- 
cies. It offers a Lean Enterprise Management approach compatible with TQM 
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while providing a level 5 CMM organizational structure. It links focused research 
with development. Best of all you can start small, evolve and expand, e.g., focus 
on a homogeneous set of projects or a particular set of packages and build from 
there. So any company can begin new and evolve. 
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Abstract 

One important component of a software process is the 
organizational context in which the process is enacted. 
This component is often missing or incomplete in current 
process modeling approaches. One technique for modeling 
this perspective is the Actor-Dependency (AD) Model. 
This paper reports on a case study which used tkis 
approach to analyze and assess a large software 
maintenance organization. Our goal was to identify the 
approach's strengths and weaknesses while providing 
practical recommendations for improvement and research 
directions. The AD model was found to be very useful in 
capturing the important properties of the organizational 
context of the maintenance process, and aided in the 
understanding of the flaws found in this process. However, 
a number of opportunities for extending and improving the 
AD model were identified. Among others, there is a need 
to incorporate quantitative information to complement the 
qualitative model. 

1. Introduction 

It has now been recognized that, in order to improve the 
quality of software products, it is necessary to enhance the 
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quality of the software processes used to develop and 
maintain them. This requires the understanding and 
modeling of these processes in order to be able to analyze 
and assess them. Following the example of other 
engineering disciplines, where empirical approaches to 
management have been successfully applied, several 
methodologies have been defined for allowing the 
characterization and incremental improvement of software 
processes [Basili, 1989; Lariphar, 1990], The modeling of 
software development and maintenance processes is a 
necessary component of these approaches. A number of 
modeling techniques have been developed, e.g., [Lott and 
Rombach. 1993; Finkelstein et al, 1994; Melo and 
Belkhatir, 1994]. 

What has received less attention in the literature is the need 
to model the organizational context in which a development 
process executes. It is not possible to fully understand and 
analyze such process issues as information flow, division of 
work, and coordination without including the organizational 
context in the analysis. Organizational context refers to 
characteristics of relationships between process participants. 
Such relationships include, among others, the management 
hierarchy, the structure of ad hoc working groups, and 
seating arrangements. Some process modeling approaches 
attempt to include mechanisms in their formalisms to deal 
with organizational structure [Curtis et. al., 1992], but not 
to any great level of detail. Some formalisms have 
specifically focused on organizational modeling [Rein, 
1992; Benus, 1994], but these lack the mechanisms and 
flexibility necessary for quantitative analysis (discussed 
later). 

One approach to organization and process modeling appears 
particularly promising [Yu and Myopoulos, 1994], This 
approach is very new (it was presented at last year's ICSE) 
and thus lacks significant validation through use. One goal 
of this paper is to report an early experience with this 
promising new approach. 

Consistent with the philosophy presented above, [Briand eL 
al., 1994] have developed an auditing process specifically 
aimed at software maintenance processes and organizations. 
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Such an approach requires, to a certain level of detail, the 
modeling of processes and organizations. In this context, 
the Actor-Dependency modeling technique [Yu and 
Mylopoulos, 1994] mentioned above appeared to be 
suitable because of its capability to capture numerous kinds 
of constraints and dependencies frequently encountered in 
complex organizations. In addition, this technique proved in 
practice to be intuitive and to facilitate communication with 
the maintenance staff of the studied organization. This paper 
reports one experience of using the Actor-Dependency 
technique to help analyze a large software maintenance 
organization. We evaluate the approach's strengths and 
weaknesses while providing practical recommendations for 
improvement. In Section 2, we briefly describe the Actor- 
Dependency modeling approach and one of its extensions 
that we have used in our study. Section 3 presents the case 
study we conducted. In Section 4 we evaluate the 
advantages and weaknesses of the AD approach. Finally, in 
Section 5, we present a number of suggestions for future 
work, both with the AD model and software organization 
modeling in general. 

2. The Actor-Dependency Modeling 
Approach 

The most important characteristic of the modeling approach 
presented by Yu et al, for our purposes, is its capability to 
fully represent the organizational context in which a 
development process is performed. This language provides 
a basic organizational model with several enhancements, 
only one of which we will describe here. The basic Actor- 
Dependency model represents an organizational structure as 
a network of dependencies among organizational entities, or 
actors. The enhancement which we have used, called the 
Agent-Role-Position (ARP) model, provides a useful 
decomposition of the actors themselves. These two 
representations are described briefly in the following 
sections. For a more detailed description, see [Yu and 
Mylopoulos, 1993]. 

2.1. The basic Actor-Dependency (AD) model 

In this model, an organization is described as a network of 
interdependencies among active organizational entities, i.e., 
actors. A node in such a network represents an 
organizational actor, and a link indicates a dependency 
between two actors. Examples of actors are: someone who 
inspects units, a project manager, or the person who gives 
authorization for final shipment. Documents to be 
produced, goals to be achieved, and tasks to be performed 
are examples of dependencies between actors. When an 
actor, Al, depends on A2, through a dependency Dl, it 
means that Al cannot achieve, or cannot efficiently achieve, 
its goals if A2 is not able or willing to fulfill its 
commitment to Dl. The AD model provides four types of 
dependencies between actors: 

* In a goal dependency, an actor (the depender) depends on 
another actor (the dependee) to achieve a certain goal or 
state, or fulfill a certain condition (die dependum). The 
depender does not specify how the dependee should do 


this. A frilly built configuration, a completed quality 
assessment, or 90% test coverage of a software 
component might be examples of goal dependencies if 
no specific procedures are provided to die dependee(s). 

• In a task dependency, the depender relies on the 
dependee to perform some task. This is very similar to 
a goal dependency, except that the depender specifies 
how the task is to be performed by the dependee, 
without making the goal to be achieved by the task 
explicit. Unit inspections are examples of task 
dependencies if specific standard procedures are to be 
followed. 

• In a resource dependency, the depender relies on the 
dependee for the availability of an entity (physical or 
informational). Software artifacts (e.g. designs, source 
code, binary code), software tools, and any kind of 
computational resources are examples of resource 
dependencies. 

• A soft-goal dependency is similar to a goal 
dependency, except that the goal to be achieved is not 
sharply defined, but requires clarification between 
depender and dependee. The criteria used to judge 
whether or not the goal has been achieved is uncertain. 
Soft-goals are used to capture informal concepts which 
cannot be expressed as precisely defined conditions, as 
are goal dependencies. High product quality, user- 
friendliness, and user satisfaction are common 
examples of soft-goals because in most environments, 
they are not precisely defined. 

Three different categories of dependencies can be established 
based on degree of criticality: 

• Open dependency : the depender's goals should not be 
significantly affected if the dependee does not fulfill his 
or her commitment. 

• Committed dependency, some planned course of action, 
related to some goal(s) of the depender, will fail if the 
dependee fails to provide what he or she has committed 
to. 

• Critical dependency: failure of the dependee to fulfill 
his or her commitment would result in the failure of all 
known courses of action towards the achievement of 
some goal(s) of the depender. 

The concepts of open, committed, and critical dependencies 
can be used to help understand actors' vulnerabilities and 
associated risks. In addition, we can identify ways in which 
actors alleviate this risk. A commitment is said to be: 

• enforceable if the depender can cause some goal of the 
dependee to fail. 

• assured if there is evidence that the dependee has an 
interest in delivering the dependum. 

• insured if the depender can find alternative ways to have 
his or her dependum delivered. 

In summary, a dependency is characterized by three 
attributes: type, level of criticality, and its associated risk- 
management mechanisms. The type (resource, soft-goal, 
goal, and task) represents the issue captured by the 
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dependency, while the level of criticality indicates how 
important the dependency is to the depender. Risk- 
management mechanisms allow the depender to reduce the 
vulnerability associated with a dependency. 


models to quantitative analysis. This is described in more 
detail in Section 5.3.3. A Case Study using the AD Model 

3.1. Background 


Figure 1 shows a simple example of an AD model. A 
Manager oversees a Tester and a Developer. The Manager 
depends on the Tester to test This is a task dependency 
because there is a defined set of procedures that the Tester 
must follow. In contrast, the Manager also depends on the 
Developer to develop, but the Developer has complete 
freedom to follow whatever process he or she wishes, so 
this is expressed as a goal dependency. Both the Tester and 
the Developer depend on the Manager for positive 
evaluations, where there are specific criteria to define 
"positive", thus these are goal dependencies. The Tester 
depends on the Developer to provide the code to be tested (a 
resource), while the Developer depends on the Tester to test 
the code well (good coverage). Assuming that there are no 
defined criteria for "good" coverage, this is a soft-goal 
dependency. 



Figure 1: A simple example of an AD model 

2.2. The Agent-Role-Position (ARP) 
decomposition 

In the previous section, what we referred to as an actor is in 
fact a composite notion that can be refined in several ways 
to provide different views of the organization. Agents, 
roles, and positions are three possible specializations of the 
notion of actor which are related as follows: 

• an agent occupies one or more positions 
« an agent plays one or more roles. 

• a position can cover different roles in different contexts 


Before describing our results, some background is necessary 
in order for the reader to understand the analysis which 
follows. We will first describe the development 
environment which serves as our context. Second, the 
auditing process used in the case study will be described. 



Figure 2. Associated Agent, Position, and Role 

3.1.1. The Studied Maintenance Organization 

This study took place in the Flight Dynamics Division 
(FDD) of the NASA Goddard Space Flight Center (GSFC). 
Over one hundred software systems for the control and 
prediction of satellite orbits, trajectories and attitude, 
totalling about 3-5 million lines of code, are maintained. 
Many of these systems are maintained over a very long 
period of time, and regularly produce new releases. About 
80 people are involved in the maintenance of these systems. 
This study focused on three systems in particular, which 
ranged from 156 to 260 thousand lines of FORTRAN code, 
and from 7 to 26 years of age. 

Numerous communication, schedule, budget and technical 
problems arise with each release. This results in somewhat 
unstable change requirements all along the release process, a 
high turnover in some projects and difficulties in meeting 
deadlines. There was a need to study these phenomena. 


Figure 2 shows an example of an actor decomposition. 
These three types of specialization are useful in several 
ways. They can be used to represent the organization at 
different levels of detail. At a very high level, one might 
use only unspecialized actors. Positions provide more 
detail, but still provide a high-level view. Roles provide 
yet more detail, and the use of agents allows the modeler to 
specify even specific individuals. The ARP decomposition 
could be especially useful when extending the use of AD 


More precisely, our framework for this study is the 
Software Engineering Laboratory (SEL). The SEL is a joint 
venture between the University of Maryland, CSC and 
NASA The SEL is an organization aimed at improving 
NASA-FDD software development processes based on 
measurement and empirical analysis. Recently, responding 
to the growing cost of software maintenance at NASA- 
FDD, the SEL has initiated a program aimed at 
characterizing, evaluating and improving its maintenance 
processes. The first step in this direction was a set of 
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studies conducted using the auditing technique described 
below. 

3.1.2. The Maintenance Process Auditing 
Methodology 

In [Briand et. al., 1994], a qualitative and inductive 
methodology has been proposed in order to characterize and 
audit software maintenance processes and organizations and 
thereby identify their specific problems and needs. This 
methodology encompasses a set of procedures which 
attempts to determine causal links between maintenance 
problems and flaws in the maintenance organization and 
process. This allows for a set of concrete steps to be taken 
for maintenance quality and productivity improvement, 
based on a tangible understanding of the relevant 
maintenance issues in a particular maintenance 
environment. The steps of this methodology can be 
summarized as follows: 

Step 1: Identify the organizational entities with which the 
maintenance team interacts and the organizational 
structure in which maintained operate. In this step the 
distinct teams and their roles in a change process are 
identified. Information flows between actors should 
also be determined. 

Step 2: Identify the phases involved in the creation of a 
new system release. Software artifacts produced and 
consumed by each phase must be identified. Actors 
responsible for producing and validating the output 
artifacts of each phase have to be identified and located 
in the organizational structure defined in Step 1. 

Step 3: Identify the generic activities involved in each 
phase, Le. decompose life-cycle phases to a lower level 
of granularity. Identify, for each low-level activity, its 
inputs and outputs and the actors responsible for them. 
Step 4: Select one or several past releases for analysis in 
order to better understand process and organization 
flaws. 

Step 5: Analyze the problems that occurred while 
performing die software changes in die selected releases 
in order to produced a causal analysis document. The 
knowledge and understanding acquired through steps 1-3 
are necessary in order to understand, interpret and 
formalize the information described in the causal 
analysis document 

Step 6: Establish the frequency and consequences of 
problems due to flaws in the organizational structure 
and the maintenance process by analyzing the 
information gathered in Step 5. 

Modeling the organizational context of the maintenance 
process was a very important step in the above analysis 
process. A model of the organization was necessary for 
communication with maintenance process participants. 
Gathering organizational information and building the 
model was critical to our understanding of the work 
environment and differences across projects. The model was 
also useful in checking the consistency and completeness of 
the maintenance process model. For example, the 


organizational model allowed us to determine whether or 
not all organizational actors had defined roles in die process 
model. During this preliminary study, the following 
requirements were identified for an optimal organizational 
modeling technique: 

Rl: The modeling methodology had to facilitate the 
detection of conflicts between organizational structures 
and goals. For example, inconsistencies between the 
expectations and intentions of interfacing actors seemed 
to be a promising area of investigation. 

R2: We needed to capture many different types of 
relationships between actors. These included 
relationships that contributed to information flow, 
work flow, and fulfillment of goals. The explicit and 
comprehensive modeling of all types of relationships 
was necessary in this context. 

R3: Different types of organizational entities had to be 
captured: individuals, their official position in the 
organizational structure, and their roles and activities in 
the maintenance process. This was important not only 
to be able to model at different levels of detail, but also 
to provide different views of the organization, each 
relaying different information. 

R4: Links between the organization and the maintenance 
process model had to be represented explicitly. 

R5: The notation had to aid in communication through 
intuitive concepts and graphical representation. 

As a starting point, we decided to use the Actor-Dependency 
model introduced by Yu et al in order to reach these 
objectives. The AD model, as we shall see, meets many of 
our requirements. 

In the next section, we provide the extended AD model of 
our maintenance organization where, for the sake of 
simplification, we use only positions (one possible 
specialization of actors) as vertices of the graph. 

3.2. AD Organizational Model 

The organizational model in Figure 3 is very complex 
despite important simplifications (e.g., agents and roles are 
not represented). This shows how intricate the network of 
dependencies in a large software maintenance organization 
can be. The lessons learned with respect to the maintenance 
organization are presented below and the approach's 
advantages and drawbacks are tire focus of die next section. 

The organizational model presented in Figure 3 was built 
using information from a variety of sources: we read 
maintenance standards release documents, interviewed 
people involved in the change and configuration 
management process, analyzed release management reports, 
and studied tire official organization charts. 

The model is by necessity incomplete. We have focused on 
those positions and activities which contribute to the 
maintenance process only. So there are many other actors 
in the NASA-FDD organization which do not appear in the 
AD graph. As well, we have aggregated some of the 
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positions where appropriate. For example. Maintenance 
Management includes a large number of separate actors, but 
for the purposes of our analysis, they can be treated as an 
aggregate. Because only the primary dependencies are shown 
at this level of detail, nearly all of them are shown as 
critical. This issue will be discussed in more detail later. 
Below are listed the positions shown in the figure, and a 
short explanation of their specific roles: 

Testers present acceptance test plans, perform acceptance 
test and provide change requests to the maintained 
when necessary. 

Users suggest, control and approve performed changes. 

QA Engineer controls maintained' wort (e.g., conformance 
to standards), attends release meetings, and audits 
delivery packages. 

Configuration Manager integrates updates into the system, 
coordinates the production and release of versions of the 
system, and provides tracking of change requests. 
Maintenance management grants preliminary approvals of 
maintenance change requests and release definitions. 
Maintainers: analyze changes, make recommendations, 
perform changes, perform unit and change validation 
testing after linking the modified units to the existing 
system, perform validation and regression testing after 
the system is recompiled by the Configuration 
Manager. 

Process Analyst collects and analyzes data from all projects 
and packages data to be reused. 

NASA Management is officially responsible for selecting 
software changes, gives official authorizations, and 
provides the budget 

The resulting organizational model was validated through 
use, within the context of the auditing methodology 
presented above. The modeling of the maintenance process, 
the release documents, and the causal analysis of 
maintenance problems allowed us to check the model for 
consistency and completeness. 

3.3 Lessons Learned 

Below are the main flaws that were found in the 
maintenance process and which we reported to the 
maintenance organization. In all cases, the flaws were 
uncovered, or at least better understood, by studying the AD 
model. 

Task Leader 

From our analysis, it appears that the Task Leader is a very 
central position. This is clearly illustrated in Figure 3. 
The centrality of the Task Leader gives rise to two possible 
problems: overloading of the person filling this position, 
and over-dependence of the project on this one position. 
Analysis of the Task Leader's role decomposition, 
especially in conjunction with quantitative analysis, would 
be helpful in determining the extent of these problems, and 
possible solutions. 

Quality Assurance 


Standards conformance and quality inspections were not 
perceived by the task leaders and maintainers as critical. 
They considered these processes mainly bureaucratic. This 
is reflected in the (non-)criticality symbols on the 
corresponding dependencies in Figure 3. This pointed out a 
weakness in the process and organization that could be 
remedied through more suitable inspection procedures and 
better definition and communication of quality needs. 

Requirements 

In Figure 3, Unambiguous requirements (a dependency 
between the Task Leader and the User) is not an enforceable 
soft-goal dependency since the users and maintainers 
(including the Task Leader) belong to two different 
management hierarchies. In other words, the Task Leader 
and User are so far removed from each other in the network 
of management dependencies that the Task Leader has no 
practical recourse for ensuring that the User provides 
unambiguous requirements. Note that the management 
dependencies are included in the AD model, but have been 
omitted from Figure 3 to simplify the diagram. Moreover, 
the fact that this dependency is a soft-goal and not a goal 
raises another issue: standards for defining unambiguous 
requirements should be defined and applied. The lack of such 
sta n dards indicates that the organization is still immature in 
this area. 

Data Collection 

Process analysts attempt to collect data in order to evaluate 
and better predict the maintenance process. However, such a 
procedure is inherently difficult to enforce when m aintai ners 
do not clearly understand the benefits of such data 
collection, in terms of useful feedback. In terms of the AD 
model, the Process Analyst’s dependency on the Maintainer 
is a vulnerability, with no reciprocal dependency to serve as 
a risk management mechanism available to reduce that 
vulnerability. 

4. Evaluation of the AD Model 
4.1. Advantages 

The notions of enforcement and assurance, as well as the 
modeling of goal and soft-goal dependencies, helped us to 
detect potential problem areas, such as critical dependencies 
that are not enforceable and for which there were no clear 
assurances of commitment. The Task Leader's need for 
unambiguous requirements is an example of such an 
inconsistency. This seemed to fulfill, at least partially, 
requirement Rl. 

The AD model captured all the information, work, and 
resource flows through resource and task dependencies. This 
allowed us to identify inconsistencies between what some 
agents needed and the support that they were actually 
getting. The problem of the Process Analyst's need for 
development data from maintainers is an example of this. 
We also found that the soft-goal dependency in particular 
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was useful in highlighting areas in which the environment 
was immature. The unambiguous requirements dependency 
exemplifies this situation. The various types of 
dependencies in die AD model therefore fulfilled requirement 
R2. 

The actor decomposition extension to the AD model makes 
a clear distinction between various organizational entities 
by defining and differentiating roles, positions and agents as 
different specializations of actors (requirement R3). This 
allowed us to extract different information from different 
versions of the AD model, using different specializations of 
the actors. For example, we found that the model remained 
fairly stable from project to project when nodes represented 
positions (as in Figure 3). However, when we used the role 
specialization, significant differences appeared between 
projects. For example, roles of managers often varied 
significantly, depending on their technical background. 
This served to show that process participants found the 
freedom to tailor their work to the situation, while the 
official organizational structure could remain stable. Roles 
also provide a way to create explicit links between the 
organizational model and any process model composed of 
consistently defined activities (requirement R4). 

Many interactions with various members of the 
maintenance organization were necessary in order to clarify 
inconsistencies and insure completeness. The AD model 
played an important role in this communication, because it 
facilitated the exchange and comparison of perceptions 
about the organizational structure. It served as a good 
communication tool (requirement R5). 

4.2. Issues 

Despite the numerous advantages of the AD model 
mentioned in the previous section, some problems have 
been identified and should be the subject of further research. 

Classification of dependencies 

Once a dependency has been identified, it is not always 
straightforward to classify it according to the defined 
taxonomy (requirement R2). One example is the difficulty 
in distinguishing between a task dependency and a goal 
dependency. A task may be partially defined (e.g., through 
standards) but some significant degree of freedom can exist 
for the dependee whose understanding of the task objectives 
may or may not be complete. It is for this reason that we 
have included no task dependencies in our AD model (see 
Figure 3). Also, the borderline between soft-goals and (hard- 
)goals is not always clear. When is a goal sufficiently 
defined to be classified as a (hard-)goal? More precise 
guidelines are needed in order to classify dependencies in an 
appropriate fashion. 

Another inadequacy of the classification scheme is in the 
case of information dependencies. As defined, information 


dependencies are one type of resource dependency. However, 
a need for information is different in nature from a need for 
time, money, or personnel resources. From a data analysis 
point of view, information dependencies are described by 
different attributes than those that would be used to describe 
other resource dependencies. For this reason, any kind of 
information flow analysis necessitates the treatment of 
information as a separate type of dependency. 

Criticality of dependencies 

No precise and unambiguous definition exists to classify a 
dependency as critical, committed or open, which impedes 
fulfillment of requirement Rl. Because of this, most of the 
dependencies in our context appeared critical since they were 
certainly important from the dependee's perspective. It was, 
from a practical perspective, difficult to determine if they 
were really indispensable. 

Another difficulty with identifying committed and open 
dependencies is that practitioners often do not mention them 
in interviews and they are usually not included explicitly in 
process documents. We have found that direct observation 
is the only effective way to capture such secondary 
dependencies. This is time- and effort-consuming. 
Furthermore, when modeling at the level of detail of our 
model, it is sufficient to include only the primary 
dependencies, which are usually critical . 

Interactions between dependencies 

The notions of enforcement, assurance and insurance are 
extremely useful but they are difficult to represent explicitly 
in the AD model representation (requirement R5). These 
notions need to be captured explicitly by the organizational 
model. In the next section, we suggest a way to do this by 
treating these three mechanisms as interactions between 
dependencies. 

5. Suggestions 

Based on this case study and our evaluation of the AD 
model, we provide some suggestions which may be useful 
to those wishing to extend the AD concepts, and to those 
who are engaged in organization modeling- 

5.1. An Entity-Relationship Model 

We believe two of the most important problems that arose 
in our work with Actor-Dependency models have a common 
solution. The first issue is the need to clearly define the 
information that needs to be collected, particularly in a 
quantitative analysis effort The second issue is that of 
separating organizational from process concerns since they 
require different types of analyses and solutions. The Entity- 
Relationship Model, shown in Figure 4 and discussed in the 
next two sections, addresses both of these issues. 
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Figure 4: Modified ER model for AD graphs 


5.1.1. ER Model 

Defining precisely the entities and attributes of interest is 
not only necessary for data analysis, but also helps clarify 
the modeling approach itself. One entity that we have added 
in Figure 4 is the Qualification entity. An agent "has" one 
or more qualifications, e.g., maintaining ground satellite 
software systems. Moreover, based on experience, it may 
be determined that some role "requires" specific 
qualifications, e.g., experience with Ada. Comparison of 
die required qualifications and the actual organizational set- 
up appears useful for identifying high-risk organizational 
patterns. 

We have retained the agent/role/position decomposition of 
an actor defined by Yu et al, which we found very useful. 
The ER model also shows "depender" and "dependee" as 
ternary relationships. This reflects die fact that a depender or 
dependee of a dependency can be either a role or a position. 
A role may be fimctionally dependent on another role in 
order to perform a given process activity. Positions are 
usually interdependent because of the need for authorization 
or authority. However, we believe that dependencies are not 
inherent to agents themselves, at least not in our context 

We have also added a new entity. Medium, which is the 
communication medium used to implement a particular 
dependency (especially information dependencies). This 
entity is used in some types of quantitative analysis, which 
is described in a later section. Finally, dependencies are 
related to each other and this is captured through the 
interaction relationship, also described in a later section. 
However, this ER model requires further definition (e.g., 
attributes should be specified), validation, and refinement. 


5.1.2. Linking an organization model with a 
process model 

The ER model also makes explicit the relationship, and the 
separation, between process and organization. Analysis of 
an organization is aided by the isolation of organizational 
issues (e.g., information flow, distribution of work) from 
purely process concerns (e.g., task scheduling, 
concurrency). This is especially true when dealing with 
quantitative data analysis. Process entities and organization 
entities are described by different quantitative attributes. 
Separation of these attributes clarifies the analysis. 
Although organization and process raise separate issues, 
their effects are related. Understanding the relationship 
between organization and process is crucial to making 
improvements to either aspect of the environment 
(requirement R4). For example, the "performs" relationship 
can link a role to a set of activities, which may be seen as 
lower-level roles. The entity Process Activity is itself 
related to other entities in the process model not specified 
here.' 

5.2. Dependency Interactions 

Interactions between dependencies need to be modeled. There 
are several different types of these interactions which may 
be seen as relationships from a source to a target 
dependence: 

1) Being committed to the source dependency makes the 
commitment to the target dependency more difficult. 
This represents a negative assurance. 

2) The source dependency is an additional motivation to 
the target dependency. This represents a positive 
assurance. 
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3) The source dependency's failure can provoke the failure 
of die target dependency. One dependency's depender is 
the other dependency's dependee and vice-versa. This 
represents a dependency enforcement. 

4) Failure of one dependency is mitigated by the other 
dependency. Both dependencies have the same depender 
but different dependees. This is a dependency 
insurance. 

If a depender can count on many dependees to deliver a 
dependum. we can say that the dependency is insured. In 
this case, different dependees can be committed to the same 
dependum. This can be graphically represented in a AD 
gTaph in a fashion similar to OR branches in AND-OR 
trees. For example. Figure 5 shows a case where a Task 
Leader (depender) can count on a Maintainer or a Tester (two 
different dependees) for delivering the "Test Plan & Results" 
(a particular dependum). 

We can also provide a representation for expressing 
assurance interactions between dependencies, shown in 
Figure 6. All nodes in the diagram are dependencies, and die 
arrows between them represent either negative or positive 
assurances. 1 In Figure 6, all the soft-goals contribute 
positively (are positive assurances) to the goal "High- 
quality release", but all but one contribute negatively to 
"Release on time”. All of these dependencies have to be 
previously defined in the AD model. 



Our suggestion for representing dependency enforcements is 
a variation of the above. A dependency which enforces 
another dependency can be seen as one which completely 
assures it. So our representation uses the same arrows 
between dependencies shown in Figure 6, with the infinity 
symbol (”«>”) in place of the plus ("+") or minus ("-"). 


1 Readers familiar with the work of Yu et al will find this 
notation similar to their Issue Argumentation model, which 
we did not make use of in our work. However, our 
notation which we present here has different semantics than 
die IA model, and the two should not be confused. 


5.3. Use of quantitative data 

The use of quantitative data is critical to the useful analysis 
of development processes and organizations. Without 
quantitative information, the analysis results are not 
sufficient to effectively compare alternatives and to make 
decisions. Qualitative analysis, while important for 
intuitive understanding and insight, must be taken further to 
provide a basis for action. For example, [Perry et. al.. 
1994] have recently attempted to characterize and quantify 
the workload of software developers across software 
development process activities. 



Figure 6: Representing Assurances 

In fact, the AD model is particularly well suited to 
incorporating data, although there is not an explicit facility 
for this in the modeling methodology. One way to perform 
such analysis is to associate attributes with the various AD 
entities (positions, roles, dependencies, etc.). The attributes 
could be used to hold the quantitative information. Then 
analysis tools can be used to analyze the AD graph, by 
making calculations, based on the data, according to the 
structure represented in the graph. 

One type of quantitative analysis, which has already been 
alluded to, is information flow analysis. Information 
dependencies (one type of resource dependency) can have 
attached to them attributes such as frequency and amount of 
information. Each information dependency is also related to 
the different communication media (the entity Medium in 
Figure 4) that it uses to pass information, e.g. phone, 
email, formal and informal documents, formal and informal 
meetings. The many-to-many relationships between 
dependencies and their media also have attributes (e.g., 
effort). Such attributes are captured by defining metrics and 
collecting the appropriate data. An example of such an 
attribute is the computation, for each information 
dependency, of the product of the dependency frequency, the 
amount of information, and the effort associated with the 
medium related to the dependency. This product gives, a 
quantitative assessment of the effort expended to satisfy the 
information dependency. Summing these values for each 
pair of actors in the AD graph shows how much effort the 
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pair expends in passing information to each other. This 
information can be used to support such management 
decisions as how to fill different positions, how to 
these people, and what communication media to make 
available. Without quantitative analysis, these decisions are 
subject to guesswork, trial and error, and the personal 
expertise of the manager. For more on metrics for 
organizational information flow, see [S eaman, 1994], 

There are several possible applications of quantitative 
analysis in relation to the actor/position/role 
decomposition. For example, during the course of our 
study, we noticed that many differences between projects 
were reflected in variations in the breakdown of positions 
into roles. In other words, the people filling the same 
positions in different projects divided their effort differently 
among their various roles. These variations were usually 
symptomatic of differences in management strategy and 
leadership style. Data needs to be collected to capture the 
important variations in effort breakdown across 
organizations and projects. This data must then be anae-ht-d 
to entities in the AD model so that it can be used to analyze 
variations in job structure. For example, suppose that we 
wanted to find out which projects require a manager with 
technical expertise. If we have quantitative data available on 
the effort breakdown of the different managers, then we can 
easily see which managers spend a high proportion of their 
time on technical activities. This information can be used 
in choosing people to fill different management positions. 
Variations in effort breakdown can also be represented in an 
AD graph by varying the thickness of the lines which join 
a position with its various roles, as shown in Figure 7. 

Effort breakdown is only one example of the many 
possibilities for analysis of the role/position/agent structure 
of actors. Qualification analysis, which would involve the 
Qualifications entity in Figure 4, is another example. 
Understanding the sharing of tasks and responsibilities is 
another area in which quantitative analysis could be useful. 
All of these involve the evaluation of quantitative attributes 
attached to roles, positions, agents, and the links (occupies, 
contains, performs) between them. 



Figure 7. Representing effort breakdown per role 

5.4. Acquisition process 

Any modeling effort requires that a great deal of information 
be collected from the environment being modeled. Building 
an AD model requires collecting information about all the 
people in the environment, the details of their jobs and 
assignments, whom they depend on to complete their tasks 


and reach their goals, etc. Our experience has shown that it 
is useful to follow a defined process for gathering this 
information, which we will call an acquisition process. 
The acquisition process which we followed, with 
modifications motivated by our experience, is briefly 
presented in this section. The steps are as follows: 

Step 1: First, we determined the official, (usually) 
hierarchical structure of the organization. Normally this 
information can be found in official organization 
charts. This gives us the set of positions and the basic 
reporting hierarchy. 

Step 2: We determine the roles covered by the positions 
by interviewing the people in each position, and then, 
to check for consistency, their supervisors and 
subordinates. Process descriptions, if available, often 
contain some of this information. However, when 
using process descriptions, the modeler must check 
carefully for process conformance. 

Step 3: In this step, we focus on the goal, resource, and 
task dependencies that exist along the vertical links m 
the reporting hierarchy. To do this, we interview 
members of different departments or teams, as well as 
the supervisors of those teams. Also, direct observation 
of supervisors, called "shadowing", can be useful in 
determining exactly what is requested of, and provided 
by supervisors for their subordinates. 

Step 4: Here we focus on resource (usually informational) 
and goal dependencies between members of the same 
team. Direct observation (through shadowing or 
observation of meetings) is also useful here. Interviews 
and process documents can also be used to identify 
dependencies. 

Step 5: Finally, we determine the informational and goal 
dependencies between different teams. These are often 
harder to identify, as they are not always explicit. 
Direct observation is especially important here, as 
often actors do not recognize their own subtle 
dependencies on other teams. It is also very important 
in this step to carefully check for enforcement, 
assurance, and insurance mechanisms, since dependers 
and dependees work in different parts of the 
management hierarchy. 

6. Conclusions 

This paper presents the experience of using the Actor- 
Dependency modeling approach to model and analyze a large 
scale maintenance organization. The AD model was found 
to be very useful at capturing the important properties of 
the organizational context of the maintenance process, and 
aided in the understanding of the flaws found in this 
process. There were, however, some inadequacies of the 
approach, which we have addressed through a set of 
proposed suggestions. However, those must be seen as 
research directions and need to be further investigated. 

One major potential extension of the approach is to use 
quantitative data and analysis methods, within the 
framework of an AD model. Qualitative methods are not 
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sufficient to differentiate organizations and especially 
variations across projects. Measurement is therefore 
necessary for studying organizations. 

The AD model also needs automated support for real-scale 
organizations. This is required to allow the user to analyze a 
real-time organization and define complementary views of 
the studied organizations, at different levels of refinement, 
at different levels of completeness. Automated support is 
especially crucial for the use of quantitative analysis. We 
need also to better define the relationship between the 
organization and the development process. Separating 
organizational concerns from process concerns, but 
considering them in conjunction with each other, is a 
crucial element in the comprehensive study of development 
environments (see [Seaman, 1994]). Finally, collecting 
information about an organization for building an accurate 
AD model is a complex task. Therefore, based on 
experience, we need to define an optimal data acquisition 
process that can be tailored to various maintenance 
environments. 
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Figure 3: AD Model of a Maintenance Organization. 
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Abstract 

Defining product metrics requires a rigorous and disciplined approach, 
because useful metrics depend, to a very large extent, on one's goals and 
assumptions about the studied software process. Unlike in more mature 
scientific fields, it appears difficult to devise a " universal " set of metrics in 
software engineering, that can be used across application environments. 

We propose an approach for the definition of product metrics which 
is driven by the experimental goals of measurement, expressed via the 
GQM paradigm, and is based on the mathematical properties of the 
metrics. This approach integrates several research contributions from the 
literature into a consistent , practical and rigorous approach. 

The approach we outline should not be considered as a complete and 
definitive solution, but as a starting point for discussion about a product 
metric definition approach widely accepted in the software engineering 
community. At this point, we intend to provide an intellectual process that 
we think is necessary to define sound software product metrics. A precise 
and complete documentation of such . an approach will provide the 
information needed to make the assessment and reuse of a new metric 
possible. Thus, product metrics are supported by a solid theory which 
facilitates their review and refinement. Moreover, their definition is made 
less exploratory and, as a consequence, one is less likely to identify spurious 
correlations between process and product metrics. 


1. Introduction 

Metrics can help address some of the most critical issues in software 
development and provide support for planning, monitoring, controlling and 
evaluating the software process. However, past approaches for designing 
new software metrics very seldom addressed a specific objective explicitly. 


This work was supported in part by NASA grant NSG— 5123, UMIACS, and NSF grant 01- 
5-24845. Sandro Morasca was also supported by grants from MURST and CNR. 
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and were usually not based upon assumptions/information about the 
characteristics of the development environment under study. These include 
descriptions of organizational structure and work procedures, guidelines, 
standards, etc. This frequently led to some degree of fuzziness in the metric 
definitions, properties, and underlying concepts, making the use of the 
metrics difficult, their interpretation hazardous, and the results of the 
various validation studies somewhat contradictory [IS88, K88J. 

As a consequence, the number of available metrics in the literature is 
quite large, but the number of used and useful metrics in industry is small. 
It is our position that, in order to make software measurement a viable part 
of the solutions to software engineering issues, metrics must be defined 
according to clear assumptions about the process under study and an 
explicit definition of the specific goal(s) to be addressed. Based on these 
goals and assumptions, desirable metric properties may be identified and 
used to direct and constrain the search for metrics. Such an approach 
appears particularly necessary for product metrics since these metrics are 
often more complex than process metrics and address phenomena that are 
poorly understood. 

The goal of this paper is to specify (based on our experience [BMB93, 
BBH93, BMB94(a)]) a practical metric definition approach, specifically 
aimed at product metrics, and usable as a practical guideline to design 
technically sound and useful metrics. The focus will be the construction of . 
prediction systems, which is a crucial application of measurement. Not all 
activities in this approach can, at this point, be fully formalized, nor do we 
believe that they will be completely formalized in the future. We think that 
formal techniques can be very effective in providing support for better 
understanding and analyzing software processes and products — indeed, 
we advocate the need for a formal definition of metrics' mathematical 
properties. However, the definition of a metric is a very human-intensive 
activity, which cannot be described and analyzed in a fully formal way. We 
believe that our metric definition approach may be better detailed, refined, 
and tailored to fit the needs of different application contexts. This will be 
made possible through the experience gained by using this metric 
definition approach across several environments. Thus, this work should 
be considered as a contribution towards a satisfactoiy solution. We point out 
what information ought to be provided when one proposes a new metric in 
order to make its review and refinement possible. Furthermore, we 
determine what intellectual process one should go through to ensure the 
technical soundness and practical usefulness of the defined metrics. A 
purely exploratory approach to metric definition would have for a 
consequence the experimental evaluation of a large number of 
relationships between product metrics (possibly not supported by any 
theory) and development process characteristics (e.g., effort). A simple 
probability calculation [F91] shows that this kind of approach is likely to 
lead to the identification of spurious statistical relationships, e.g., 
correlations uniquely due to coincidence. 

Several important research issues involved in the definition of such 
an approach have already been investigated. Basili et al. [B92] [BR88] have 
provided templates to define operational experimental goals for software 
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measurement. Melton et al. have studied product abstraction properties 
[MGB903. Weyuker [W88] and Tian and Zelkowitz [TZ92] have studied 
desirable properties for complexity metrics. In addition, the latter authors 
provided a property-based classification scheme for such metrics. Fenton 
and Melton [FM90], and Zuse [Z90] have investigated the use of 
measurement theory to determine measurement scales. Finally, 
Schneidewind has proposed a validation framework for metrics [S92]. All 
this research needs to be integrated into a consistent and practical metric 
definition approach. 

The paper is organized as follows. In the next section, we provide an 
overview of a practical metric design approach in part inspired by the work 
referenced above, and augmented with some new ideas. Then, in the 
subsequent sections, we separately show each step of our metric design 
approach in detail (Sections 3-8). Section 9 outlines the directions for future 
work. 


2. Overview of Our Metric Definition Approach 

We provide here an overview of the steps composing this approach, as 
illustrated in Figure 1 by a Data Flow Diagram. The remaining sections 
will go in detail through all the issues involved in each of the steps and will 
provide examples. 

Step 1: Define Experimental GoaL(s) 

Define the experimental goal(s) of the data collection, based on the 
general corporate objectives (e.g. reduce cycle time) and the available 
information about the studied development environment (e.g., 
weaknesses, problems). This step requires goal definition techniques. 
The Goal/Question/Metric paradigm (GQM) [B92] [BR88] is one of the 
approaches that can be used to this end. It provides a set of templates to 
define experimental goals and refines them into concrete and realistic 
questions, which subsequently lead to the definition of metrics. For 
instance, a GQM goal is: 

Analyze software components for the' purpose of prediction with respect 
to the number of faults from the viewpoint of the project manager. 

(We will use this very simple example to illustrate the steps of our 
approach during this concise overview.) A GQM goal specifies the 
object(s) of study ( software components), the purpose of measurement 
( prediction ), the quality focus of interest (the number of faults), and 
viewpoint ( project manager) from which measurement is performed. 
The goal strongly impacts all other steps of the metric definition 
approach and the information they need. For instance, the object of 
study and the viewpoint are used to determine the product artifacts and 
information to be taken into account. The GQM paradigm uses 
descriptive models (e.g., definition of complexity metrics) and predictive 
models (e.g., cost models) in order to achieve the experimental goals it 
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specifies. However, the GQM paradigm does not specify how to generate 
these models. In this paper, we expand the GQM paradigm to address 
this issue with respect to product descriptive models. As we will see in 
Section 6, questions about product characteristics are no longer 
necessary in our approach. However, GQM questions on the confidence 
with which assumptions are stated and on the quality (e.g., accuracy of 
collection procedures, granularity) of data to be collected [B92, BR88] still 
need to be asked. We will not address this issue, which is beyond the 
scope of this paper. 



Figure 1. Goal-Driven and Property-Based 
Definition Approach for Product Metrics 
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Step 2: State Assumptions 

Based on the object of study and the quality focus (as defined by the 
experimental goals, Step 1), a set of relevant assumptions must be stated 
to embody our intuitive knowledge about the development environment 
and object(s) of study. Assumptions implicitely define an order on the set 
of objects of study with respect to the quality focus [MGB90]. For 
instance, components are ordered with respect to their error-proneness. 
Furthermore, while stating these assumptions, relevant measurement 
concepts are identified, e.g., size. For instance, based on developers' 
interviews and a careful study of the development environment, we 
might assume that the larger the number of sequential blocks of 
statements and conditional statements in a program, the higher the 
number of faults. From this assumption, size appears to be a possibly 
relevant measurement concept. As an input for this step, we need 
information on the development environment (e.g., descriptive process 
model), product information and expert opinion as an intuitive basis for 
the assumptions. Besides assumptions, the outputs of this step also 
include 

a set of relevant measurement concepts (e.g., size) 
a better definition of the relevant aspects of the object of study (e.g., 
statement blocks and control flow) 

Step 3: Formalize Relevant Measurement Concepts 

Relevant measurement concepts of interest are formally defined (e.g., 
size, complexity, coupling, cohesion) through their mathematical 
properties. Thus, they are clearly characterized and the search for 
metrics is guided and constrained by these generic properties. This 
makes the search for metrics less exploratory and provides precise 
mathematical criteria for assessing the soundness of the metrics to be 
defined. The mathematical properties characterizing the concepts are 
identified independently from the concept instantiation into a metric 
[TZ923 [Z90] [W88] and are therefore referred to as generic concept 
properties. With reference to our simple example, we can say that a 
property of size is that it is non-negative. As opposed to other papers on 
the subject, we believe that these properties are subjective even though 
some of them might be widely accepted. However, it appears that, for a 
matter of convenience, a universal set of properties should be defined for 
the most important concepts used by the software engineering 
co mmunit y, as is the case for more mature engineering disciplines. It is 
important, when defining metrics, that one precisely determines the 
meaning of concepts like' size or complexity. Existing definitions may, 
however, be reused when available and, conversely, the newly created 
concepts may be stored so that they may be eventually reused. 

Step 4: Define Product Abstractions and Refine Properties 

One needs to define abstractions of the object of study that capture all the 
information (i.e., objects, attributes, relationships) needed to express the 
assumptions and the relevant product aspects they refer to. Some 
examples of product abstractions are data flow graphs, data dependency 
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graphs, and control flow graphs. These abstractions will be 
representations of the object of study that will help us express useful 
properties and define metrics. For our example, we may assume that 
control flow graphs are suitable abstractions with respect to the set of 
assumptions and the concepts defined. 

Once useful abstractions are defined, a set of new properties is added 
to the generic concept properties. The objective is to formalize the 
ass ump tions stated in Step 3: The intuitive ordering of the objects of 
study (e.g., components) with respect to the quality focus (e.g., 
components' error-proneness) must be preserved by the ordering of 
abstractions (e.g., components' control flow graphs) with respect to each 
measurement concept (e.g., components' size) [MGB90]. For instance, 
under the assumption stated in Step 3, and given two control flow 
graphs Gl and G2, we can preserve the intuitive ordering captured by 
the assumption if we define the following size property: the size of Gl is 
greater than the size of G2 if Gl has more nodes than G2. These 
additional properties allow us to tailor the generic concepts to any 
particular quality focus and set of assumptions. It should be noted that 
the added properties must be consistent with the generic properties 
defined in Step 2. These added properties are specific to a given context of 
measurement (i.e., goal, concept, assumptions, abstractions) and sure 
referred to as context-dependent properties. At this point, if the defined 
abstractions sire not fully adequate to define the context-dependent 
properties, this step can be reiterated. 

Steps 2, 3, and 4, taken as a whole, can be seen as a macro-step in which 
measurement models [F913 (i.e., abstractions and generic/context- 
dependent properties, main outputs of Step 4) are defined based on the 
experimental goals, environmental characteristics, and product 
information (inputs of Step 2). 

Step 5: Define Metrics 

Metrics are defined based upon the defined product abstraction(s), 
concepts and their associated properties. Existing metrics can also be 
reused if they satisfy the defined properties. With respect to our 
example, size can be simply measured as the number of nodes in a 
control flow graph. We are not able, at this point, to select optimal 
metrics from those metrics satisfying the generic and context-dependent 
properties. Experimental validation (Step 6) will help us do so. 

Step 6: Experimental Validation 

After defining metrics in Step 5, the data collected on the actual products 
must be used to validate the assumptions upon which the metrics are 
built. The procedure to follow for experimental validation varies 
si gnifi cantly depending on the purpose of measurement. With respect to 
prediction, which is our main focus here, one needs to validate the 
product metrics with respect to their statistical relationship to the 
quality focus of interest. For example, we might find a very strong 
correlation between the defined size metric and a simple descriptive 
model of error-proneness, e.g., the number of faults. If the assumptions 
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are not supported by the experimental results, we need to repeat from 
Step 2, re-consider the assumptions and properties, then re-define new 
metrics. The definition and validation of metrics are performed 
iteratively until the metric validation yields satisfactory results [S92]. 

It is important to mention that most of the outputs (e.g., product 
abstractions, assumptions) of the steps defined above are reusable. They 
should be packaged and stored so that they can be efficiently and effectively 
reused [BR88]. In a mature development environment, inputs for most of 
those steps should come from reused knowledge. 

Moreover, many refinement loops are not represented in Figure 1. 
For example, as we said in the description of Step 6, poor experimental 
results may trigger the need for refining assumptions. This is an 
important issue that needs further investigation. 

In the remainder of this paper, we will use this definition approach 
to define data flow size and complexity metrics as simple examples. Each 
step will be discussed in detail in a different section. Each section contains 
three subsections: 

Definition of the step 

Examples 

Discussion of related issues. 


3. Define Experimental Goal(s) (Step 1) 

Definition 

In this section, we apply the first step of the Goal/Question/Metric 
paradigm [B92, BR88] to set the measurement goals. Here is a s umm ary of 
templates that can be used to define goals: 

Object of study: products, processes, resources 

Purpose: characterization, evaluation, prediction, improvement, ... 

Quality focus: cost, correctness, defect removal, changes, refiahility, ... 
Viewpoint: user, customer, manager, developer, corporation, ... 

A detailed description of the GQM paradigm is beyond the scope of the 
paper. A comprehensive description of the GQM paradigm can be found in 
[B92, BR88]. 

It is important to note that the four goal dimensions mentioned above 
have a direct impact on the remaining steps of the metric definition 
approach and, from a more general perspective, the whole data collection 
program. This can be summarized as follows: 

The object of study helps determine the 

software artifacts that are to be modeled by mathematical 
abstractions in order to be analyzable (Step 4). 
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assumptions (Step 2) that may be relevant because related to the 
object of study. 

The purpose points out what is the intended use of the metrics to be d efin ed 
and therefore the 

type of data to be collected, e.g., process improvement requires 
additional data over process prediction (e.g., with respect to 
development effort), in order to allow for the determination of optimal 
techniques and methods. For example, performance data are needed 
in sufficient amount to ensure a minimal level of confidence in the 
improvement decisions. 

amount of data to be collected, e.g., if prediction usually requires 
more data than characterization so that the identified relationships 
are statistically significant. Characterization only requires the data 
to be representative of what is to be characterized. 

The quality focus helps determine the 

dependent variable against which the defined product metrics are 
going to be experimentally validated (Step 6) [S92]. This dependent 
variable will in fact be a descriptive model of the quality focus. For 
instance, number of requirement changes per month per thousand of 
lines of code is a descriptive model of requirement instability. Since 
there may be alternative models, validation may require the use of 
several dependent variables. In this case, if inconsistent 
experimental results are obtained, the dependent variables are very 
likely to actually capture different quality focuses, 
assumptions (Step 2) linking the object of study characteristics to the 
quality focus of interest. 

The viewpoint helps dete rmin e 

the point in time at which characterizations, predictions, or 
evaluations should be carried out and therefore what product 
information will be available to define product abstractions and 
metrics (Steps 4, 5). 

what information is costly or difficult to acquire and consequently, 
what information should be left out of the model if it does not show a 
sufficiently strong impact on the quality focus (Steps 5, 6). 
the definition of descriptive models of the quality focus. For example, 
from the user’s point of view, error-proneness may be defined as the 
mean time to failure, whereas, from the tester point of view, it may be 
defined as the number of errors occuring during the test phase. 

In this framework, we will not derive questions from goals as suggested by 
the GQM paradigm. A justification will be provided in Section 6. 
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Example of a goal 

Let us assume that one of the corporate objectives is to reduce development 
time, and more particularly the time spent on testing activities. Assuming 
that previous studies have shown that errors are usually concentrated in a 
small number of ’’difficult" components (example of information about the 
development environment), the following experimental goal seems 
pertinent. By identifying error-prone components, we may concentrate 
verification activities where needed and, thereby, reduce effort. 

Goal G 

Object of study: component 
Purpose: prediction 
Quality focus: error-proneness 
Viewpoint: tester 

Let us take an example to illustrate the impact of the defined experimental 
goal on our metric definition approach. We know from the object of study 
that we have to define relevant component mathematical abstractions so we 
can derive component metrics. We know from the purpose of measurement 
that we need to collect enough data about the quality focus to allow a 
statistically significant validation of the relationships between the 
component metrics to be defined and the quality focus. This requires that 
we better define our quality focus: error-proneness. Very likely, we need to 
determine precisely how to count defects, e.g., what testing and inspection 
phases should be taken into account?, are all errors equal or should they be 
weighted according to a predefined error taxonomy? Such questions are 
also dependent on the particular viewpoint. In our example, testers want to 
find out where errors are and more particularly critical errors (according 
to their own definition of criticality). Therefore, errors will be weighted 
according to the level of criticality of their consequences. Similarly, errors 
could be weighted according to the correction effort they require. The 
determination of suitable error counting procedures will depend on the 
particular application of the predictive model to be built and therefore on the 
viewpoint of our experimental goal. 

In the next sections, we will discuss more precisely about the impact 
of experimental goals on the definition of software product metrics. 


Discussion 

The definition of the goals is a fundamental phase, since all other steps in 
our approach are affected by the experimental goals. Therefore, extra care 
must be used when setting the goals. Specific descriptive process models 
and knowledge acquisition techniques can be used to better understand the 
issues that are most relevant to software development in a software 
organization. Careful application of the GQM paradigm provides two 
important results: 
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Data collection is ensured to respond to the specific needs of the 
software organization; 

The derivation of metrics from explicit goals and the defmition of 
explicit measurement models (output of Step 4 of our approach) allow 
the analyst to specify a priori the interpretation mechanisms 
associated with the collected data. This prevents a posteriori search 
for patterns which are not based on precise assumptions. 


4. State Assumptions (Step 2) 

Definition 

We have to state assumptions (see examples below) about some aspects of 
the software process under study that are relevant to the experimental 
goals. These assumptions capture our intuitive understanding of the 
studied phenomena and need to be explicit so they can be discussed, 
questioned and refined. Various sources of information can be used to 
devise pertinent assumptions. A thorough understanding of the working 
procedures, methodologies and techniques used in the studied development 
environment, combined with the interview of domain experts, is usually 
very helpful [BBK94]. The set of assumptions defines an ordering on the set 
of products [MGB90] with respect to the quality focus. This ordering will be 
used to evaluate the adequacy of the metrics defined in the remainder of 
this approach. 


An assumption is a statement believed to be true about the relationship 
between the quality focus and the characteristics of the object of study. 


Stating assumptions helps identify the measurement concepts (e.g., size, 
complexity) that are characteristics of the object of study relevant to the 
goal. In addition, assumptions allow us to identify artifacts, or parts of 
artifacts (e.g., definitions, condition expressions), that must be taken into 
account for the definition of suitable product abstractions. 

Examples of assumptions 

In order to capture our intuitive understanding about data flow size and 
complexity, we define the following assumptions. 

Assumption 1: 

The larger the number of definitions and condition expressions, the- larger 
the likelihood of error. 

Assumption 2: 

The larger the number of definitions and condition expressions 
"depending" on a definition D, the larger the probability of ripple effects if D 
is to be created or modified. 
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Assumption 3: 

The larger the number of definitions on which a definition or a condition- 
expression D "depends", the more difficult it is to create and understand D. 

Assumption 4: 

The larger the "distance" between two definitions or condition expression 
D1 and D2, where D2 depends on Dl, the more difficult the control of ripple 
effects on D2 if Dl is to be created or modified. 

The concepts between quotes are not defined: they make sense on an 
intuitive level. They will be formally defined later, either via the definition of 
product abstractions (as is the case of "dependency"), or additional concept 
properties in Step 4 (as is the case of "distance"). 

Discussion 

At this point, several sets of consistent assumptions could be defined. This 
would lead to multiple categories of metrics, reflecting the inherent 
uncertainty associated with the assumptions. In Step 6, experimental 
results will eventually help us select the best category of metrics for each 
concept. For example, we could assume that when a condition expression 
CE (as opposed to a definition) depends on a definition D, this increases the 
probability of misunderstanding and ripple effect between D and CE. This 
stems from the fact that condition expressions also have an implicit effect 
on the definitions in the block they control. This additional assumption 
(referred to as Assumption 5) affects the metric definition approach, as we 
show in the following steps. 


5. Formalize Relevant Measurement Concepts (Step 3) 

Definition 

The relevant measurement concepts are defined by specifying the 
mathematical properties that are believed to characterize them. In our 
framework, these properties should be used- to constrain and guide the 
search for new metrics. In addition, as shown in [BMB94(b)j, intuition may 
lead to properties showing awkward mathematical properties 1 . One should 
always make sure that a metric exhibits properties that are essential for its 
technical soundness. These properties are independent from both any 
specific product abstraction and any future instantiation of the concept into 
any specific metric. Therefore, they are called generic. 


1 The authors of this paper were several times misled in the definition of software metrics 
that were intuitively appealing, but, after a more thorough analysis, showed inconvenient 
and unsubstantiated properties. 
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A measurement concept is a class of metrics characterized by a set of 
mathematical properties (i.e. } generic concept properties) and associated 
with an intuitive software product characteristic, e.g., size. 


The generic properties associated with a measurement concept should not 
be contradictory — there must be at least one metric that satisfy them. 
Moreover, these properties should hold for the admissible transformations 
[Z90] of the scale of measurement (i.e., nominal, ordinal, interval, ratio, 
absolute) on which it is intended to define metrics. In other words, there 
should not be any contradiction between the scale of measurement which is 
assumed while using and interpreting a defined metric and its generic 
properties. 

Examples of concepts and their generic properties 

In this example, we provide properties that are, in our opinion, generic for 
metrics related to size and complexity. These concepts are believed to.be 
relevant with respect to many experimental goals and applications, and in 
particular with respect to the goal defined above. As. for complexity, the 
properties we define are related to the properties several authors have 
already provided in the literature (see [US92, TZ92, W88]). However, since 
we may want to use these properties on artifacts other than software code 
and on abstractions other than control-flow graphs, we formalized them in 
a more general manner. A thorough discussion of these properties — which 
is beyond the scope of this paper — can be found in [BMB94(b)]. These 
properties are provided as an example. Nevertheless, in the metric 
definition approach we outline in this paper, other sets of properties [TZ92] 
[W88] may be used, since the selection of properties is, to some extent, 
subjective. 

Size and complexity are concepts related to systems, in general, i.e., 
one can speak about the size of a system and the complexity of a system. In 
our general framework — recall that we want these properties to be as 
independent as possible from any specific product abstraction — , a system 
is characterized by its elements and the relationships between them. 

Definition 1: Representation of Systems and Modules 

A system S will be represented as a pair <E,R>, where E represents the set 
of elements of S, and R is a binary relation on E (R c E x E) representing the 
relationships between S's elements. 

Given a system S = <E,R>, a system m = <Em,R m > is a module of S if 
and only if E m c E, Rm cExE, and R m c R. This will be denoted bymcS. 


0 

As an example, E can be defined as the set of code statements and R as the 
set of control flows from one statement to another. A module m may be a 
code fragment or a subprogram. 
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Concept: Size 

Intuitively, size is recognized as being an important measurement concept. 
According to our framework, size cannot be negative (property Size.l), and 
we expect it to be null when a system does not contain any elements 
(property Size.2). When modules do not have elements in common, we 
expect size to be additive (property Size.3). 

Definition 2: Size 

The size of a system S is a function Size(S) that is characterized by the 
following properties Size.l - Size.3. 

0 


Property Sized: Non-negativity 

The size of a system S = <E,R> is non-negative 

Size(S) > 0 (Size.l) 

o 


Property SizeJi: Null Value 

The size of a system S = <E,R> is null if E is empty 


E = 0 =s> Size(S) = 0 
(Size.II) 


0 


Property Size.3: Module Additivity 

The size of a system S = <E,R> is equal to the sum of the sizes of two of its 
modules mi = <E mi . Rmi > and m 2 = <E m 2,R m 2> such that any element of S 
is an element of either mi or m 2 

(mi c S and m 2 c S and E = Emi u Em?, and Emi r> En^ = 0) 

=* Size(S) = Size(mi) + Size(m 2 ) (Size.III) 

0 

The last property Size.3 provides the means to compute the size of a system 
S = <E,R> from the knowledge of the size of its — disjoint — modules 
m e = <{e},Re> whose set of elements is composed of a different element e of 
E 2 . 

Size(S) = ^Size(m e ) (Size.IV) 

eeE 

Therefore, adding elements to a system cannot decrease its size 


2 For each, nu , it is either Re=0 or R e ={<e,e>}. 
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(S' = <E’,R> and S" = <E",R"> and E' c E H ) =» Size(S') < Size(S") (Size.V) 

From the above properties Size.l - Size.3, it also follows that the size of a 
system S = <E,R> is not greater than the sum of the sizes of any pair of its 
modules mi = <E m i,R m i> and m 2 = <Em2,Rm2>, such that any element of S 
is an element of mi, or m 2 , or both, i.e., 

(mi c S and m 2 c S and E = Emi u E^) => Size(S) < Size(mi) + Size(m 2 ) 

(Size.VI) 

The size of a system built by merging such modules c ann ot be greater than 
the sum of the sizes of the modules, due to the presence of common 
elements (e.g., lines of code, operators, class methods). 

Properties Size.l-Size.3 hold when applying the admissible transformation 
of the ratio scale [F91]. Therefore, there is no contradiction between our 
concept of size and the definition of size metrics on a ratio scale. 


Concept: Complexity 

Intuitively, the complexity of a product is a measurement concept that is 
considered extremely relevant to system properties. It has been studied by 
several researchers [BMB94(b)3. In our framework, we expect product 
complexity to be non-negative (property Complexity.!) and to be null 
(property Complexity. 2) when there are no relationships between the 
elements of a system. However, it could be argued that the complexity of a 
system whose elements are not connected to each other .does not need to be 
necessarily null, because each element of E may have some complexity of 
its own. In our view, complexity is a system property that depends on the 
relationships between elements, and is not an isolated element's property 
[BMB94(b)j. 

Complexity should not be sensitive to representation conventions with 
respect to the direction of arcs representing system relationships (property 
Complexity.3). A relation can be represented in either an "active” (R) or 
"passive" (R* 1 ) form. The system and the relationships between its elements 
are not affected by these two equivalent representation conventions, so a 
complexity metric should be insensitive to this. 

Also, the complexity of a system S should be at least as much as the 
sum of the complexities of any collections of its modules, such that no two 
modules share relationships, but may only share elements (property 
Complexity .4). We believe that this property is the one that most strongly 
differentiates complexity from the other system concepts. Intuitively, this 
property may be explained by two phenomena. First, the transitive closure 
of R is a larger graph than the graph obtained as the union of the transitive 
closures of R' and R" (where R' and R" are contained in R). As a 
consequence, if any kind of indirect (i.e., transitive ) relationships between 
elements is considered in the computation of complexity, then the 
complexity of S may be larger than the sum of its modules' complexities, 
when the modules do not share any relationship. Otherwise, they are equal. 
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Second, merging modules may implicitely generate relationships (note 
R' u R" £ R in formula Complexity.IV's premise) between the elements of 
each module (e.g., definition-use relationships may be created when blocks 
are merged into a common system). As a consequence of the above 
properties, system complexity should not decrease when the set of system 
relationships is increased (property Complexity .4). 

Last, the complexity of a system made of disjoint modules is the sum 
of the complexities of the single modules (property Complexity.5). 
Consistent with property Complexity .4, this property is intuitively justified 
by the fact that the transitive closure of a graph composed of several disjoint 
subgraphs is equal to the union of the transitive closures of each subgraph 
taken in isolation. Furthermore, if two modules are put together in the 
same system, but they are not merged, i.e., they are still two disjoint 
module in this system, then no additional relationships are generated from 
the elements of one to the elements of the other. 

Definition 3: Complexity 

The complexity of a system S is a function Complexity(S) that is 
characterized by the following properties Complexity.l - Complexity.5. 

o 


Property Complexity.l : Non-negativity 

The complexity of a system S = <E,R> is non-negative 

ComplexityCS) > 0 (Complexity.l) 

0 


Property Complexity. 2: Null Value 

The complexity of a system S = <E,R> is null if R is empty 

R = 0 => Complexity(S) = 0 (Complexity.il) 

0 


Property Complexity.Z: Symmetry 

The complexity of a system S = <E,R> does not depend on the convention 
chosen to represent the relationships between its elements 

(S = <E,R> and S -1 = <E3' 1 >) => Complexity(S) = Complexity^ -1 ) 

(Complexity.III) 

o 


Property Complexity .4: Module Monotonicity 

The complexity of a system S = <E,R> is no less than the sum of the 
complexities of any two of its modules with no relationships in common 


4-17 


SEL-95-003 



(S = <E3> and mi = <E m i,R m i> and m 2 = <E m 2,Rm2> 

and Em 1 vj E m 9 £ E and Rmi R m9 s R and Em 1 o R m? ~ 0 ) 

=> Complexity(S) > Complexity(mi) + Complexity(m 2 ) 

(Complexity.IV) 

0 


Properly Complexity. 5: Disjoint Module Additivity 

The complexity of a system S = <E,R> composed of two disjoint modules 
mi = <B mi ,Rmi>, m 2 = <Em9,Rm9> is equal to the sum of the complexities of 
the two modules 

(S = <Emi u Emg-Rml v Rm2> and Emi n Em 9 = 0 and Rmi n Rm9 = 0) 

=> Complexity(S) = Complexity(mi) + Complexity(m 2 ) 

(Complexity .'V) 

0 

As a consequence of the above properties Complexity. 1 - Complexity .5, it can 
be shown that the complexity of a system is no less than the complexity of 
any of its modules, i.e., adding relationships between elements of a system 
does not decrease its complexity 

(S' = <E,R'> and S" = <E,R"> and R' c R") . 

=> ComplexityCS 1 ) < Complexity(S") 

(Complexity. VI) 

Properties Complexity. 1 - Complexity. 5 hold when applying the admissible 
transformations of the ratio scale. Therefore, there is no contradiction 
between our concept of Complexity and the definition of Complexity metrics 
on a ratio scale. 

Discussion 

The paragraphs above, stating the motivations and justifications for size 
and complexity concepts, illustrate the subjectivity of the metric definition 
approach. However, it is important that all concept properties be explicitly 
justified and motivated so that their limitations may be understood and the 
discussion on their validity may be facilitated. 


6. Define Product Abstractions and Refine Concept Properties 

(Step 4) 

Definition 

We first need to define an abstraction that helps us precisely capture and 
define all the concepts involved in the stated ass um ptions. Abstractions are 
mathematical representations of the product(s) (usually graphs). Products 
have to be mapped into abstractions so they become analyzable and some of 
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their attributes become quantifiable [MGBB90]. The choice should be 
entirely guided by the experimental goals (i.e., the object of study and the 
quality focus) and the set of assumptions, that is, the abstractions must 
capture all the concepts involved in the set of assumptions related to the 
object of study. The mapping from the product to the abstraction needs to be 
checked for completeness, i.e., Does the abstraction contain all the 
relationships between nodes that one wants to capture? Is the level of 
granularity of the abstraction nodes sufficient to represent accurately the 
product? One way of assessing the suitability of an abstraction is to study 
the effect of relevant modifications in the product and assess its impact on 
the abstraction, e.g., number of nodes and edges added or removed, change 
of topology in a graph. Several abstractions capturing control flow, data 
flow and data dependency information are available in the literature [M90, 
BBC88, 080]. However, an even larger variety of abstractions can be derived 
from software products. 

The set of properties associated with each concept is expanded so as 
to formalize the order existing on the set of abstractions with respect to each 
concept as defined by the assumptions. Therefore, the order formalized by 
the newly introduced properties is intended to preserve the order defined by 
the assumptions so that concepts have a monotonic relationship with the 
quality focus of interest. For example, given that the quality focus is error- 
proneness and that a Definition-Use (D-U) graph DUG1 is defined as more 
complex than another graph DUG2 and assuming that there is a 
monotonic relationship between error-proneness and complexity, we expect 
the assumptions to state that the product corresponding to DUG1 is more 
error-prone than that of DUG2. 

These properties are specific to a given context of measurement (i.e., 
goal, concept, assumptions, abstractions) and are referred to as context- 
dependent properties. They will, most of the time, capture effects on the 
ordering of abstractions when modifications are performed on these 
abstraction. These modifications will often be what is referenced as atomic 
modifications in [Z90], adding / removing / moving / substituting an 
edge/node. They will be useful in order to constrain and guide the search 
for metrics (Step 5). 

Examples 

In our example, D-U graphs are a suitable abstraction since they capture 
concepts such as definitions, condition expressions, uses. D-U graphs are 
directed graphs where nodes are statements or conditions and arcs are 
definition-use clear paths [RW82], Moreover, concepts such as 
"dependencies" or "distance” can be derived from such graphs. A definition 
or a condition expression "depends" on a definition when the 
variable/constant defined in the latter is used in the former. A suitable 
definition of "distance" between two definitions will be provided in the next 
section. 
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Concept : Size 

Property CD1: Count of definitions 

If a graph DUGi has at least as many definitions and condition expressions 
as another graph DUG2, then Size(DUGi) > Size(DUG2). 

0 

The above property CDl is not implied by the generic properties Size.l- 
Size.3, since DUGi and DUG2 have nothing to do with each other, i.e., they 
are not related by any inclusion relationship (DUG2 is not necessarily 
included in DUGi). 

Concept: Complexity 

Property CD2: Sum of distances 

Let DUGi and DUG2 be two Definition-Use graphs. If the sum of the 
distances between all pairs of nodes in DUGi is greater than the sum of 
distances between all pairs of nodes in DUG2, then Complexity(DUGi) > 
Complexity(DUG2). 

: 0 

The distance between two nodes is the number of arcs in the longest path 
between the two nodes that contains no repetitions of elementary cycles 
(cycles that do not traverse the same arc twice). As an example, the 
distance between nodes b and c in the D-U graph of Figure 2 is 4, i.e., the 
number of arcs of the path {<b,c>,<c,e>,<e,b>,<b,c>}. In this path, the arc 
<b,c> is traversed twice, but it is only traversed once in the cycles 
{<b,c>,<c,e>,<e,b>} and {<c,e>,<e,b>,<b,c>} contained in the path. When 
several paths exist between two nodes, we select the longest one because the 
shortest or average path distance would not satisfy the monotonicity 
property (Complexity. 4). For instance, adding an arc in a graph may 
decrease the length of the shortest path between two nodes. The distance 
between two unrelated nodes is zero because the absence of relation does not 
add any complexity, consistent with the generic property Complexity.2. 
This shows how generic properties constrain the definition of metrics and 
help make the right decisions. As an example of distance calculations, 
consider the D-U graph in Figure 2. 

If Assumption 5 is considered, a different abstraction is necessary: 
Data-Dependency (D-D) graphs [BBC88]. This abstraction captures the links 
between condition expressions and the definitions they can affect. In this 
case, the following property holds: 

Property CDS: Definitions versus condition expressions 
Let DDGi and DDG2 be two Data Dependency graphs. If DDG2 is identical 
to DDGi except for the fact that one of the condition expressions of DDGi 
has been substituted with a definition to form DDG2, then 
Complexity(DDGi) > Complexity(DDG2). In other words, a condition 
expression is the source of more complexity than a definition. 

0 
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Figure 2. Example of D-U graph 


The distances between the nodes in Figure 2 are computed in Table 1. 



a 

b 

c 

d 

e 

a 

0 

5 

6 

0 

6 

b 

0 

3 

4 

0 

5 

c 

0 

5 

3 

0 

4 

d 

0 

5 

6 

0 

4 

e 

0 

4 

5 

0 

3 

Table j 

.. Distances 

between the nodes of the D-U graph in 

Figure 2 


Discussion 

According to the GQM paradigm, questions must be derived from goals. In 
our particular framework, questions about product characteristics (e.g., 
what is the complexity of a component?) are not necessary and the outputs 
of Steps 2, 3, and 4 may be seen as a more rigorous substitute to questions. 
Thus, metrics are not intended to answer questions but to validate 
assumptions. However, as we have shown, there may be aspects of the 
relevant environmental characteristics that cannot be explicitly modeled, 
e.g., the quality of the data and the validity of the assumptions, so questions 
may still be necessary to support the full interpretation of the metrics. 

As pointed out in [FM90, F94], not all abstractions may be comparable 
with respect to a particular measurement concept. In such cases, it 
appears difficult to define a total order on the set of abstractions and only a 
partial order can be obtained [MGB903. Ultimately, statistical analysis can 
only be conducted independently on comparable subsets of abstractions. 
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One of the main difficulties of this step is to ensure that the set of 
context-dependent properties is complete. Completeness is reached when 
the properties can fully describe the ordering of abstractions, i.e., when any 
pair of comparable abstractions can be ordered by using the stated 
properties or their combination. 

It is also necessary to verify that the newly introduced context- 
dependent properties define metrics whose scales are consistent with those 
defined by the generic properties, i.e., ratio, interval, ordinal, nominal. 


7. Define Metrics (Step 5) 

Definition 

For each concept, metrics are defined by using the abstractions' elements 
and relationships and are checked against the concepts' generic and 
context-dependent properties. Management and resource constraints are 
taken into account at this point for defining convenient metrics. This step 
may require approximations which must be performed explicitly, based on 
a solid theory, and in a controlled manner. At this stage, we are not able to 
select the best among alternative metrics satisfying generic and context- 
dependent properties. Experimental validation (Step 6) will help us perform 
such a selection. As a necessary precondition to carrying out a meaningful 
experimental validation, the measurement scale (i.e., nominal, ordinal, 
interval, ratio, absolute [FM90], [Z90]) of the metrics must be clearly 
identified. This prevents metrics from being misused (e.g., taking the 
average value of an ordinal metric, which is meaningless). 

Examples 

Concept: Size 

A simple size metric is given by the number of definitions and condition 
expressions, i.e., the number of nodes in the Definition-Use graph. Other 
size metrics can be devised, by associating a weight with each node. 
However, this would require that additional assumptions be made. 

Concept: Complexity 

The most straightforward metric that comes to mind is the number of arcs 
in the graph. However, this does not take into account Assumption 4 since 
distances between pairs of nodes may not have an impact on the metric. In 
this context, a complexity metric that seems relevant and that satisfies the 
generic and context-dependent properties is the sum of distances between 
every pair of nodes in the DUG graph. 
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IE! 


Cplx(DUG)= 



i=l 


IEI 

^ Distance(Nodej,Nodej ) 

j=l 


(i) 


where Nodei, Nodej e E. 

If Assumption 5 and Property CD3 are taken into account, then another 
complexity metric can be defined as follows 


IEI 

Cplx(DDG)= 

i=l 



IEI 

y^Distance(Nodej, Nodej) 

j=l 


( 2 ) 


Note that the formula is identical but the abstraction used is different, i.e., 
Data-Dependency Graphs (DDG). This metric is therefore different from the 
one in (1). The weight of condition expressions in formula (2) has increased 
since path distances are made longer by the link between condition 
expressions and the definitions that belong to the block they control. 

Discussion 

Once metrics have been defined, it must be proven that they are consistent 
with the generic and context-dependent properties. With reference to our 
examples, it can be easily shown that the metrics we define for size and 
complexity satisfy their respective sets of generic and context-dependent 
properties. Thus, they can be shown to preserve the intuitive order defined 
on the abstractions with respect to the quality focus. 


8. Experimental Validation of the Metrics (Step 6) 

After defining metrics in Step 5, the data collected on actual software 
products and processes must be used to validate the metrics 
experimentally. This is done differently according to the purpose of 
measurement. With respect to prediction, it is required to validate the 
assumptions on which the product metrics are based. In other words, 
significant statistical relationships must be identified between the product 
metrics and the quality focus (or rather a particular descriptive model of 
the quality focus) and, furthermore, these relationships must be consistent 
with what is specified by the assumptions. Validation procedures for other 
measurement purposes (e.g., characterization) will not be discussed here. 
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With respect to prediction, experimental validation may be seen as a search for statistical 
relationships between metrics of the object of study and a descriptive models of the quality 
focus (e.g., # error for Error-proneness). 


Numerous analysis techniques, both univariate and multivariate 
[S92, BBH93, DG84], exist in the statistical and machine learning 
literature. If such assumptions and properties are not validated, we need to 
repeat from Step 2, re-consider the assumptions and properties, then re- 
define new metrics. This metric definition/validation cycle is iterated until 
the metric validation yields satisfactory results. Since extensive material is 
available on the subject, we will not describe this step any further. 


9. Conclusions and Future Work 

Product metrics need to be defined in a rigourous and disciplined manner 
based on a precisely stated experimental goal, assumptions, properties, and 
a thorough experimental validation. In order to do so, we propose a 
definition approach that is intended to help analysts develop product 
metrics. This approach integrates many contributions from the literature 
and is intended to be the starting point for a practical product metric 
definition approach to be discussed by the software engineering 
community, on both the academic and industrial sides. This approach is 
the result of our past experience [BMB93, BBH93, BMB94(a)] and is 
validated through realistic examples. 

Our future work encompasses a more detailed study and validation of 
each of the steps involved in the metric definition approach. In this 
framework, we proposed definitions for the measurement concepts usually 
encountered in software engineering, such as complexity, size, coupling, 
cohesion, etc [BMB94(b)]. Such a work aims at building a formal, 
unambiguous, and comprehensive theory. Also, we need to better 
understand how experimental results can be used to guide the refinement 
of metric. The refinement process of metrics needs to be better understood 
and defined so that metrics can evolve with the increase in understanding 
and refinement of the studied development processes. Last, we need to 
better identify what can be reused across environments and projects, e.g., 
metrics, assumptions, measurement concepts, product abstractions. 
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Abstract 

Little theory exists in the field of software system measurement Concepts such as complexity , 
coupling, cohesion or even size are very often subject to interpretation and appear to have 
inconsistent definitions in the literature . As a consequence, there is little guidance provided to the 
analyst attempting to define proper measures for specific problems. Many controversies in die 
literature are simply misunderstandings and stem from die fact that some people talk about different 
measurement concepts under die same label ( complexity is the most common case). 

There is a need to define unambiguously the most important measurement concepts used in 
the measurement of software products. One way of doing so is to define precisely what 
mathematical properties characterize these concepts, regardless of the specific software artifacts to 
which these concepts are applied. Such a mathematical framework could generate a consensus in 
the software engineering community and provide a means for better communication among 
researchers, better guidelines for analysts, and better evaluation methods for commercial static 
analyzers for practitioners. 

In this paper, we propose a mathematical framework which is generic, because it is not 
specific to any particular software artifact, and rigorous, because it is based on precise 
mathematical concepts. This framework defines several important measurement concepts (size, 
length, complexity, cohesion, coupling). It does not intend to be complete or fully objective; other 
frameworks could have been proposed and different choices could have been made. However, we 
believe that the formalisms and properties we introduce are convenient and intuitive. In addition, 
we have reviewed the literature on this subject and compared it with our work This framework 
contributes constructively to a firmer theoretical ground of software measurement 


1 . Introduction 

Many concepts have been introduced through the years to define the characteristics of the artifacts 
produced during the software process. For instance, one speaks of size and complexity of software 
specification, design, and code, or cohesion and coupling of a software design or code. Several 
techniques have been introduced, with the goal of producing software which is better with respect 
to these concepts. As an example, Pamas [P72] design principles attempt to decrease coupling 
between modules, and increase cohesion within modules. These concepts are used as a guide to 
choose among alternative techniques or artifacts. For instance, a technique may be preferred over 
another because it yields artifacts that are less complex; an artifact may be preferred over another 
because it is less complex. In turn, lower complexity is believed to provide advantages such as 
lower maintenance time and cost This shows the importance of a clear and unambiguous 
understanding of what these concepts actually mean, to make choices on more objective bases. The 
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definition of relevant concepts (Le., classes of software characterization measures) is the first step 
towards quantitative assessment of software artifacts and techniques, which is needed to assess 
risk and find optimal trade-offs between software quality, schedule and cost of development 

To capture these concepts in a quantitative fashion, hundreds of software measures have 
been defined in the literature. However, the vast majority of these measures did not survive the 
proposal phase, and did not manage to get accepted in die academic or industrial worlds. One 
reason for this is the fact that they have not been built using a clearly defined process for defining 
software measures. As we propose in [BMB94(b)], such a process should be driven by clearly 
identified measurement goals and knowledge of the software process. One of its crucial activities is 
the precise definition of relevant concepts, necessary to lay down a rigorous framework for 
software engineering measures and to define meaningful and well-founded software measures. 
The theoretical soundness of a measure, i.e., the fact that it really measures the software 
characteristic it is supposed to measure, is an obvious prerequisite for its acceptability and use. The 
exploratory process of looking for correlations is not an acceptable scientific validation process in 
itself if it is not accompanied by a solid theory to support it Unfortunately, new software measures 
are very often defined to capture elusive concepts such as complexity, cohesion, coupling, 
connectivity, etc. (Only size can be thought to be reasonably well understood.) Thus, it is 
impossible to assess the theoretical soundness of newly proposed measures, and the acceptance of 
a new measure is mostly a matter of belief. 

To this end, several proposals have appeared in the literature [LJS91, TZ92, W88] in 
recent years to provide desirable properties for software measures. These works (especially 
[W88]) have been used to "validate" existing and newly proposed software measures. 
Surprisingly, whenever a new measure which was proposed as a software complexity measure did 
not satisfy the set of properties against which it was checked, several authors failed to conclude 
that their measure was not a software complexity measure, e.g., [CK94, H92]. Instead, they 
concluded that their measure was a complexify measure that does not satisfy that set of properties 
for complexity measures. What they actually did was provide an absolute definition of a software 
complexity measure and check whether the properties were consistent with respect to the measure, 
i.e., check the properties against their own measure. 

This situation would be unacceptable in other engineering or mathematical fields. For 
instance, suppose that one defines a new measure, claiming it is a distance measure. Suppose also 
that that measure fails to satisfy the triangle inequality, which is the characterizing property of 
distance measures. The natural conclusion would be to realize that that is not a distance measure, 
rather than to say that it is a distance measure that does not satisfy the conditions for a distance 
measure. However, it is true that none of the sets of properties proposed so far has reached so 
wide an acceptance to be considered "the" right set of necessary properties for complexity. It is our 
position that this odd situation is due to the fact that there are several different concepts that are still 
covered by the same word: complexity. 

Within the set of commonly mentioned software characteristics, size and complexity are the 
ones that have received the widest attention. However, the majority of authors have been inclined 
to believe that a measure captures either size or complexity, as if, besides size, all other concepts 
related to software characteristics could be grouped under the unique name of complexity. 
Sometimes, even size has been considered as a particular kind of complexity measure. 

Actually, these concepts capture different software characteristics, and, until they are 
clearly separated and their similarities and differences clearly studied, it will be impossible to reach 
any kind of consensus on the properties that characterize each concept relevant to the definition of 
software measures. The goal of this paper is to lay down die basis for a discussion on this subject, 
by providing properties for a — partial — set of measurement concepts that are relevant for the 
definition of software measures. Many of the measure properties proposed in the literature are 
generic in the sense that they do not characterize specific measurement concepts but are relevant to 
all syntactically-based measures (see [S92, TZ92, W88]). In this paper, we want to focus on 
properties that differentiate measurement concepts such as size, complexity, coupling, etc. Thus, 
we want to identify and clarify the essential properties behind these concepts that are commonplace 
in software engineering and form important classes of measures. Thus, researchers will be able to 
validate their new measures by checking properties specifically relevant to the class (or concept) 
they belong to (e.g., size should be additive). By no means should these properties be regarded as 
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the unique set of properties that can be possibly defined for a given concept Rather, we want to 
provide a theoretically sound and convenient solution for differentiating a set of well known 
concepts and check their analogies and conflicts. Possible applications of such a framework are to 
guide researchers in their search for new measures and help practitioners evaluate the adequacy of 
measures provided by commercial tools. 

We also believe that the investigation of measures should also address artifacts produced in 
the software process other than code. It is commonly believed that the early software process 
phases are the most important ones, since the rest of the development depends on the artifacts they 
produce. Oftentimes, the concepts (e.g., size, complexity, cohesion, coupling) which are believed 
relevant with respect to code are also relevant for other artifacts. To tins end, the properties we 
propose will be general enough to be applicable to a wide set of artifacts. 

The paper is organized as follows. In Section 2, we introduce the basic definitions of our 
framework. Section 3 provides a set of properties that characterize and formalize intuitively 
relevant measurement concepts: size, length, complexity, cohesion, coupling. We also discuss the 
rela tio nships and differences between the different concepts. Some of the best-known measures are 
used as examples to illustrate our points. Section 4 contains comparisons and discussions 
reg ardin g the set of properties for complexity measures defined in the paper and in the literature. 
The conclusions and directions for future work come in Section 5. 


2 . Basic Definitions 

Before introducing the necessary properties for the set of concepts we intend to study, we provide 
basic definitions related to the objects of study (to which these concepts can be applied), e.g., size 
and complexity of whatl 

Systems and modules 

Two of the concepts we will investigate, namely, size (Section 3.1) and complexity (Section 3.3) 
are related to systems, in general, ie., one can speak about the size of a system and the complexity 
of a system. We also introduce a new concept, length (Section 3.2), which is related to systems. In 
our general framework — recall that we want these properties to be as independent as possible of 
any product abstraction — , a system is characterized by its elements and die relationships between 
them. Thus, we do not reduce the number of possible system representations, as elements and 
relationships can be defined according to needs. 

Definition 1: Representation of Systems and Modules 

A system S will be represented as a pair <E,R>, where E represents the set of elements of S, and 
R is a binary relation on E (R s E x E) representing the relationships between S's elements. 

Given a system S = <E,R>, a system m = <Ex n ,R m > is a module of S if and only if 
E m c E, R m cExE, and R m g R. As an example, E can be defined as the set of code 
statements and R as the set of control flows from one statement to another. A module m may be a 
code segment or a subprogram. 

The elements of a module are connected to the elements of the rest of the system by 
incoming and outgoing relationships. The set InputR(m) of relationships from elements outside 
module m = <Exn,R m > to those of module m is defined as 

InputR(m) = {<ei,e2> e R1 e2 e E m and ei e E - Em} 

The set OutputR(m) of relationships from the elements of a module m = <Em,R m > to those of the 
rest of the system is defined as 

OutputR(m) = {<ei,e2> e Rl ei e E m and e2 e E - Em} 

0 
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We now introduce inclusion, union, intersection operations for modules and the definitions of 
empty and disjoint modules, which will be used often in the remainder of the paper. For notational 
convenience, they will be denoted by extending the usual set-theoretic notation. We will illustrate 
these operations by means of die system S = <E,R> represented . in Figure 1 , where 
E= {a,b,c,d,e,f,g,h,i,jjcj,m} and R= (<b,a>,<b,f>,<c,b>,<c,d>,<c,g>,<d,f>,<e,g>,<f,i>, 
<fJk;>,<gjn>,<h,a>,<h 4 >,<i,j>,<k,j>,<kJ>} . We will consider the following modules 


mi = <Emi,R m i> = <{a,b,fAjJc},{<b,a>,<b^>,<f 4 >,<f 4 c> > <i,j>,<kj>} (area filled 
with E 5 SSS) 

m2 = <Em2,Rm2> = <{f,jjk} , { <f,k>,<k,j> } (area filled with ESS 53 ) 

m3 = <Em3,Rm3> = <{c,d,e,f,g,j,k,m},{<c,d>,<c,g>,<d,f>,<e,g>,<f,k>,<g,m>. 


<k,j>}> (area filled with l/Ad/J) 

m4 = <Eni4,R m 4> = <{d,e,g},{<e,g>}> (area filled with IlllfcUil tQ 

Inclusion. Module mi = <Emi,Rml> is said to be included in module m2 = <Em?. Rm7 > 
(notation: mi c m2) if Emi e Em?, and R m i £ Rm2- In Figure 1, 014 c m3. 


Union. The union of modules mi = <E m i,R m i> and m2 = <Em2»Rm2> (notation: mi u m2) 
is the module <Emi u E m 7 , Rmi u Rm2>. In Figure 1 , the union of modules mi and m3 is 
module mi3 = <{a,b,c,d,e,f,g,i,j,k,m}, {<b,a >, <b,f>,< c, d>,<c,g> ,<d,f>,<e,g>,<f,i>, 
<fJo»,<g4n>,<i,j>,<k,j>} (area filled with kV\v\l or or V///A). 

Intersection. The intersection of modules mi = <Emi,R m i> and m2 = < Em7. Rm?> (notation: 
min m2) is the module <Emi n EmO.Rmi n Rm7>. In Figure 1 , m2 = mi n m3. 


Empty module. Module < 0 , 0 > (denoted by 0 ) is the empty module. 


Disjoint modules. Modules mi and m2 are said to be disjoint if mi n m2 = 0 - In Figure 1 , 
min 014 = 0. 



Figure 1 . Operations on modules. 

Since in this framework modules are just subsystems, all systems can theoretically be decomposed 
into modules. The definition of a module for a particular measure in a specific context is just a 
matter of convenience and programming environment (e.g., language) constraints. 
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Modular systems 

The other two concepts we will investigate, cohesion (Section 3.4) and coupling (Section 3.5), are 
meaningful only with reference to systems that axe provided with a modular decomposition, Le., 
one can speak about cohesion and coupling of a whole system only if it is structured into modules. 
One can also speak about cohesion and coupling of a single module within a whole system. 

Definition 2: Representation of Modular Systems 

The 3-tuple MS = <E,RM> represents a modular system if S = <ER> is a system according to 
Definition 1, and M is a collection of modules of S such that 

VeeE ( 3 m e M(m = <E m ,R m > and e e Em) ) and 

V mi, m 2 e M (mi = <Emi,R m i> and m 2 = <Em2>Rm2> and Emi n Em2 = 0) 

i.e, the set of elements E of MS is partitioned into the sets of elements of the modules. 

We denote the union of all the Rm’s as IR. It is the set of intra-module relationships. Since 
the modules are disjoint, the union of all OutputR(m)'s is equal to the union of all InputR(m)'s, 
which is equal to R-IR. It is the set of inter-module relationships. 

0 

As an example, E can be the set of all declarations of a set of Ada modules, R the set of 
dependencies between them, and M the set of Ada modules. 

Figure 2 shows a modular system MS = <E,R,M>, obtained by partitioning the set of 
elements of the system in Figure 1 in a different way. In this modular system, E and R are the 
same as in system S in Figure 1, and M = {mi,m 2 ,m 3 }. Besides, IR = {<b,a>,<c,d>,<c,g>, 
<e,g>,<f4>,<fjc>,<g,m>,<h,a>,<i,j>,<k,j>,<k4>}- 


MS 



Figure 2. A modular system. 

It should be noted that some measures do not take into account the modular structure of a system. 
As already mentioned, our concepts of size and complexity (defined in Sections 3.1 and 3.3) are 
such examples, i.e„ in a modular system MS = <E,R,M>, one computes size and complexity of 
the system S = <E,R>, and M is not considered. 

We have defined concept properties using a graph-theoretic approach to allow us to be 
general and precise. It is general because our properties are defined so that no restriction applies to 
die definition of vertices and arcs. Many well known product abstractions fit this framework, e.g., 
data dependency graphs, definition-use graphs, control flow graphs, USES graphs, 
Is_Component_of graphs, etc. It is precise because, based on a well defined formalism, all the 
concepts used can be mathematically defined, e.g., system, module, modular system, and so can 
the properties presented in the next section. 
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3. Concepts of Measurement and Properties 

It should be noted that the concepts defined below are to some extent subjective. However, we 
wish to assign them intuitive and convenient properties. We consider these properties necessary 
but not sufficient because they do not guarantee that the measures for which they hold are useful or 
even make sense. On the other hand, these properties will constrain the search for measures and 
therefore make the measure definition process more rigorous and less exploratory [BMB94(b)]. 
Several relevant concepts are studied: size, length, complexity, cohesion, and coupling. They do 
not represent an exhaustive list but a starting point for discussion that should eventually lead to a 
standard definition set in the software engineering community. 


3.1. Size 

Motivation 

Intuitively, size is recognized as being an important measurement concept According to our 
framework, size cannot be negative (property Size.l), and we expect it to be null when a system 
does not contain any elements (property Size.2). When modules do not have elements in common, 
we expect size to be additive (property Size.3). 

Definition 3: Size 

The size of a system S is a function Size(S) that is characterized by the following properties Size.l 


Property Size.l: Non-negativity 

The size of a system S = <E,R> is non-negative 

Size(S) > 0 (Sized) 

0 

Property Size.2 : Null Value 

The size of a system S = <E,R> is null if E is empty 

E = 0 => Size(S) = 0 (SizeJI) 

0 


Property Size.3: Module Additivity 

The size of a system S = <E,R> is equal to the sum of the sizes of two of its modules 
mi = <E m i,R In i> and m2 = <E m 2,R m 2> such that any element of S is an element of either mi 
or m2 

(mi c S and m2 c S and E = E m i u E m 2 and E m i n E m 2 = 0) 

=> Size(S) = Size(mi) + Size(m2) (SizeTEI) 

0 

For instance, the size of the system in Figure 2 is the sum of the sizes of its three modules 
mi,m2,m3. 

The last property Size.3 provides the means to compute the size of a system S = <E,R> from the 
knowledge of the size of its— disjoint — modules m e = <{e},Re> whose set of elements is 
composed of a different element e of E 1 . 

‘For each % it is either Re = 0 or Re = {<e,e>}. 
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(Size.IV) 


Size(S) = ^Size(me) 
eeE 

Therefore, adding elements to a system cannot decrease its size ( size monotonicity property) 

(S' = <E',R’> and S" = <E",R"> and E’ = E n ) => Size(S’) £ Size(S") (Size.V) 

From die above properties Size.l - Size.3, it follows that the size of a system S = <EJi> is not 
greater than the sum of the sizes of any pair of its modules mi = <E m i,R m i> and 
m2 = <Em'7-Rm?>. such that any element of S is an element of mi, or m2, or both, Le., 

(mi £ S and m2 c S and E = Emi u Em2) => Size(S) £ Size(mi) + Size(m2> (Size. VI) 

The size of a system built by merging such modules cannot be greater than the sum of the sizes of 
the modules, due to the presence of common elements (e.g., lines of code, operators, class 
methods). 

Properties Size.l - Size.3 hold when applying the admissible transformation of the ratio scale 
[F91]. Therefore, there is no contradiction between our concept of size and the definition of size 
measures on a ratio scale. 

Examples and counterexamples of size measures 

Several measures introduced in the literature can be classified as size measures, according to our 
properties Size.l - Size.3. With reference to code measures, we have: LOC, #Statements, 
#Modules, #Procedures, Halstead's Length [H77], #Occurrences of Operators, #Occurrences of 
Operands, #Unique Operators, #Unique Operands. In each of the above cases, the representation 
of a program as a system is quite straightforward. Each counted entity is an element, and the 
relationship between elements is just the sequential relationship. 

Some other measures that have been introduced as size measures do not satisfy the above 
properties. Instances are the Estimator of length, and Volume [H77], which are not additive when 
software modules are disjoint (property Size.3). Indeed, for both measures, the value obtained 
when two disjoint software modules are concatenated may be less than the sum of the values 
obtained for each module, since they may contain common operators or operands. Note that, in 
this context, the graph is just the sequence of operand and operator occurrences. Disjoint code 
segments are disjoint subgraphs. 

On the other hand, other measures, that are meant to capture other concepts, are indeed size 
measures. For instance, in the object-oriented suite of measures defined in [CK94], Weighted 
Methods per Class (WMC) is defined as the sum of the complexities of methods in a class. 
Implicitly, the program is seen as a directed acyclic graph (a hierarchy) whose terminal nodes are 
methods, and whose nonterminal nodes are classes. When two classes without methods in 
common are merged, the resulting class's WMC is equal to the sum of the two WMC's of the 
original classes (property Size.3 is satisfied). When two classes with methods in common are 
merged, then the WMC of the resulting class may be lower than the sum of the WMC's of the two 
ori ginal classes (formula Size. VI). Therefore, since all size properties hold (it is straightforward to 
show that properties Size.1 and Size.2 are true for WMC), this is a class size measure. However, 
WMC does not satisfy our properties for complexity measures (see Section 3.3). Likewise, NOC 
(Number Of Children of a class) and Response For a Class (RFC) [CK94] are other size 
measures, according to our properties. 
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3.2. Length 

Motivation 

Properties Size.l - Size.3 characterize the concept of size as is commonly intended in software 
engineering. Actually, the concept of size may have different interpretations in everyday life, 
depending on the measurement goal. For instance, suppose we want to park a car in a parallel 
parking space. Then, the "size" we are interested in is die maximum distanc e between two points 
of the car linked by a segment parallel to the car's motion direction. The above properties Size.l - 
Size.3 do not aim at defining such a measure of size. With respect to physical objects, volume and 
weight satisfy the above properties. In the partic ular case that the objects are nuidiTT>p.n.<rinua1 (or 
that we are interested in carrying out measurements with respect to only one dimension), then these 
concepts coincide. 

In order to differentiate this measurement concept from size, we call it length. Length is 
non-negative (property Length. 1), and equal to 0 when there are no elements in the system 
(property Length.2). In extreme situations where systems are composed of unrelated elements this 
property allows length to be non-null. If a new relationship is introduced between two elements 
belonging to the same connected component 2 of the graph representing a system, the length of the 
new system is not greater than the length of the ori ginal system (property Length.3). The idea is 
that, in this case, a new relationship may make the elements it connects "closer" than they were. 
This new relationship may reduce the maximum distance between elements in the connected 
component of the graph, but it may never increase it On the other hand, if a new relationship is 
introduced between two elements belonging to two different connected components, the length of 
the new system is not smaller than the length of the original system. This stems from the fact that 
the new relationship creates a new connected component, where the maximum distance between 
two elements cannot be less than the maximum distance between any two elements of either 
original connected component (property Length.4). Length is not additive for disjoint modules. 
The length of a system containing several disjoint modules is the maximum length among them 
(property Length.5). 

Definition 4: Length 

The length of a system S is a function Length(S) characterized by the following properties 
Length. 1 - Length.4. 

0 


Property Length.l : Non-negativity 

The length of a system S = <E,R> is non-negative 

Length(S) > 0 (LengthJ) 

0 


Property Length.2'. Null Value 

The length of a system S = <E,R> is null if E is empty 

(E = 0) => (Length (S) = 0) (LengthJI) 

0 


Property Length.3'. Non-increasing Monotonidty for Connected Components 
Let S be a system and m be a module of S such that m is represented by a connected component of 
the graph representing S. Adding relationships between elements of m does not increase die length 
of S 


z Here, two elements of a system S are said to belong to die same connected component if there is a path from one to 
the other in die non-directed graph obtained from the graph representing S by removing directions in the arcs. 
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(S = <E,R> and m = <E m »Rm> and m £ S 
and m "is a connected component of S" and 

S' = <E,R'> and R' = Ru {<ei,e2>} and <ei,e2> e R 

and ei e Emi and e2 e Emi) => Length(S) > Length(S') (Length.HI) 

0 

Property Length. 4 : Non-decreasing Monotonidty for Non-connected Components 
Let S be a system and mi and m2 be two modules of S such that mj and m2 are represented by two 
separate connected components of the graph representing S. Adding relationships from elements of 
mi to dements of m2 does not decrease the length of S 

(S = <E,R> and mi = <Emi,R m i> and m2 = <E m 2,R m 2> 

and mi c S and m2 c S "are separate connected components of S" and 
S' = <E,R’> and R' = Ru {<ei,e2>} and <ei,e2> e R 

and ei e Emi and e2 e Em?) => Length(S’) > Length(S) (Length.IV) 

0 


Property Length. 5 : Disjoint Modules 

The length of a system S = <E,R> made of two disjoint modules mi, m2 is equal to the maximum 
of the lengths of mi and m2 

(S = mi u m2 and mi n m2 = 0 and E = E m i u Em2) =» 

Length(S) = max{ Length(mi) JLength(m2) } (Length. V) 

0 

Let us illustrate the last three properties with systems S, S', S", represented in Figure 3. We will 
assume that mi = rn’i = m”i, m2 = m'2 = m n 2, and m3 = m’3 = m"3. The length of system 
S, composed of the three connected components mi, m2, and m3, is the maximum value among 
the lengths of mi, m2, and m3 (property Length V). System S' differs from system S only because 
of the added relationship <c,m> (represented by die thick dashed arrow), which connects two 
elements already belonging to a connected component of S, m3. The length of system S' is not 
greater than the length of S (property LengthJH). System S" differs from system S only because 
of the added relationship <b,f> (represented by the thick solid arrow), which connects two 
elements belonging to two different connected components of S, mi and m2. The length of system 
S" is not less than the length of S (property LengthJV). 

Properties Length. 1 - Length.5 hold when applying the admissible transformation of the 
ratio scale. Therefore, there is no contradiction between our concept of length and the definition of 
length measures on a ratio scale. 

Examples of length measures 

Several measures can be defined at the system or module level based on the length concept A 
typical example is the depth of a hierarchy. Therefore, the nesting depth in a program [F91] and 
Drr (Depth of Inheritance Tree— which is actually a hierarchy, in the general case) defined in 
[CK94] are length measures. 


3.3. Complexity 

Motivation 

Intuitively, complexity is a measurement concept that is considered extremely relevant to system 
properties. It has been studied by several researchers (see Section 4 for a comparison between our 
framework and the literature). In our framework, we expect complexity to be non-negative 
(property Complexity. 1) and to be null (property Complexity.2) when there are no relationships 
between the elements of a system. However, it could be argued that the complexity of a system 
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whose elements are not connected to each other does not need to be necessarily null, because each 
element of E may have some complexity of its own. In our view, complexity is a system property 
that depends on die relationships between elements, and is not an isolated element's property. The 
complexity that an element taken in isolation may — intuitively — bring can only originate from die 
relationships between its "subelements. " For instance, in a modular system, each module can be 
viewed as a "high-level element" encapsulating "subelements." However, if we want to consider 
the system as composed of such "high-level elements" (E), we should not "unpack" them, but only 
consider them and their relationships, without considering their "subelements” (E'). Otherwise, if 
we want to consider the contribution of the relationships between "subelements" (R'), we actually 
have to represent the system as S = <E', RuR’>. 





Figure 3. Properties of length. 
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Complexity should not be sensitive to representation conventions with respect to the direction of 
arcs representing system relationships (property Complexity.3). A relation can be represented in 
either an "active" (R) or "passive" (R _1 ) form. The system and the relationships between its 
elements are not affected by these two equivalent representation conventions, so a complexity 
measure should be insensitive to this. 

Also, the complexity of a system S should be at least as much as the sum of the 
complexities of any collections of its modules, such that no two modules share relationships, but 
may only share elements (property Complexity.4). We believe that this property is the one that 
most strongly differentiates complexity from the other system concepts. Intuitively, this property 
may be explained by two phenomena. First, the transitive closure of R is a larger graph than the 
graph obtained as the union of the transitive closures of R' and R" (where R’ and R" are 
contained in R). As a consequence, if any kind of indirect (Le., transitive) relationships between 
elements is considered in the computation of complexity, then the complexity of S may be larger 
than the stun of its modules' complexities, when the modules do not share any relationship. 
Otherwise, they are equal. Second, merging modules may implicitely generate between the 
elements of each modules. (e.g., definition-use relationships may be created when blocks are 
merged into a common system). As a consequence of the above properties, system complexity 
should not decrease when the set of system relkionships is increased (property Complexity.4). 

However, it has been argued that it is not always the case that the more relationships 
between the elements of a system, the more complex the system. For instance, it has been argued 
that adding a relationship between two elements may make the understanding of the system easier, 
since it clarifies the relationship between the two. This is certainly true, but we want to point out 
that this assertion is related to understandability, rather than complexity, and that complexity is 
only one of the factors that contribute to understandability. There are other factors that have a 
strong influence on understandability, such as the amount of available context information and 
knowledge about a system. In the literature [MGB90], it has been argued that the inner loop of tire 
ShellSort algorithm, .taken in isolation, is less understandable than the whole algorithm, since the 
role of the inner loop in the algorithm cannot be fully understood without the rest of the algorithm. 
This shows that understandability improves because a larger amount of context information is 
available, rather than because the complexity of the ShellSort algorithm is less than that of its inner 
loop. As another example, a relationship between two elements of a system may be added to 
explicitly state a relationship between them that was implicit or uncertain. This adds to our 
knowledge of the system, while, at the same time, increases complexity (according to our 
properties). In some cases (see above examples), the gain in context information/knowledge may 
overcome the increase in complexity and, as a result, may improve understandability. This stems 
from the fact that several phenomena concurrently affect understandability and does not mean in 
any way that an increase in complexity increases understandability. 

Last, the complexity of a system made of disjoint modules is the sum of the complexities of 
the single modules (property Complexity^). Consistent with property Complexity.4, this property 
is intuitively justified by the fact that the transitive closure of a graph composed of several disjoint 
subgraphs is equal to the union of the transitive closures of each subgraph taken in isolation. 
Furthermore, if two modules are put together in the same system, but they are not merged, Le., 
they are still two disjoint module in this system, then no additional relationships are generated from 
the elements of one to the elements of the other. 

The properties we define for complexity are, to a limited extent, a generalization of the 
properties several authors have already provided in the literature (see [LJS91, TZ92, W88]) for 
software code complexity, usually for control flow graphs. We generalize them because we may 
want to use them on artifacts other than software code and on abstractions other than control flow 
graphs. 

Definition 5: Complexity 

The complexity of a system S is a function Complexity(S) that is characterized by the following 
properties Complexity.l - Complexity.5. 

0 
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Property Complexity .1: Non-negativity 
The complexity of a system S = <E,R> is non-negative 

Complexity(S) > 0 (Complexity J) 

0 


Property Complexity. 2: Null Value 

The complexity of a system S = <E iR> is null if R is empty 

R = 0 => Complexity(S) = 0 (Complexity.II) 

0 


Property Complexity. 3: Symmetry 

The complexity of a system S = <E,R> does not depend on the convention chosen to represent 
the relationships between its elements 

(S=<E,R> and S* 1 =<E,R _1 >) => Complexity(S) = Complexity^* 1 ) (Complexity JB) 

0 


Property Complexity .4: Module Monotonicity 

The complexity of a system S = <E,R> is no less than the stun of the complexities of any two of 
its modules with no relationships in common 

(S = <E,R> and mi = <E m i,R m i> and m 2 = <E m 2,R m 2> 
and mi u m 2 c S and R m l r» R m 2 = 0) 

=> Complexity(S) > Complexity(mi)+Complexity(m 2 ) (Complexity JV) 

0 

For instance, the complexity of the system shown in Figure 4 is not smaller than the sum of the 
complexities of mi and m 2 . 



Figure 4. Module monotonicity of complexity. 

Property Complexity. S'. Disjoint Module Additivity 

The complexity of a system S = <E,R> composed of two disjoint modules mi, m 2 is equal to the 
sum of the complexities of the two modules 

(S = <E,R> and S = mj u m 2 and mi n m 2 = 0) 

=> Complexity(S) = Complexity(mi) + Complexity(m 2 ) (Complexity. V) 

0 


4-38 


SEL-95-003 




For instance, the complexity of system S in Figure 2 is the sum of the complexities of its modules 
mi, m2, and m3. 

As a consequence of the above properties Complexity.l - Complexity.5, it can be shown 
that adding relationships between elements of a system does not decrease its complexity 

(S’ = <E,R'> and S" = <E,R"> and R' q R") 

=* Complexity(S') < Complexity(S") (Complexity. VI) 

Properties Complexity.l - Complexity.5 hold when applying the admissible transformation of the 
ratio scale. Therefore, there is no contradiction between our concept of complexity and the 
definition of complexity measures on a ratio scale. 

Comprehensive comparisons and discussions of previous work in the area of complexity 
properties are provided in Section 4. 

Examples and counterexamples of complexity measures 

In [080], Oviedo proposed a data flow complexity measure (DF). In this case, systems are 
programs, modules are program blocks, elements are variable definitions or uses, and relationships 
are defined between the definition of a given variable and its uses. The measure in [080] is simply 
defined as the number of definition-use pairs in a block or a program. Property Complexity .4 
holds. Given two modules (Le., program blocks) which may only have common elements (Le., no 
definition-use relationship is contained in both), the whole system (Le., program) has a number of 
relationships (Le., definition-use relationships) which is at least equal to the sum of the numbers of 
definition-use relationships of each module. Property Complexity.5 holds as well. The number of 
definition-use relationships of a system composed of two disjoint modules (Le., blocks between 
which no definition-use relationship exists), is equal to the sum of the numbers of definition-use 
relationships of each module. As a conclusion, DF is a complexity measure according to our 
definition. 

In [McC76], McCabe proposed a control flow complexity measure. Given a control flow 
graph G = <E,R> (which corresponds — unchanged — to a system for our framework), 
Cyclomatic Complexity is defined as 

v(G) = IRI - IEI + 2p 

where p is the number of connected components of G. Let us now check whether v(G) is a 
complexity measure according to our definition. It is straightforward to show that, except 
Complexity.4, the other properties hold. In order to check property Complexity.4, let G = <E,R> 
be a control flow graph and Gi = <Ei,Ri> and G2 = <E2,R2> two non-disjoint control flow 
subgraphs of G such that they have nodes in common but no relationships. We have to require that 
Gi and G2 be control flow subgraphs, because cyclomatic complexity is defined only for control 
flow graphs, Le., graphs composed of connected components, each of which has a start node — a 
node with no incoming arcs — and an end node — a node with no outgoing arcs. Property 
Complexity.4 requites that the following inequality be true for all such Gi and G2 

IRI - IEI + 2p > IRi 1 - IEi! + 2pi + IR2I - IE 2 I + 2p2 

i.e., 2(pi + p2 - p) ^ IEi I + IE2I - IEI, where pi and p2 are the number of connected 
components in Gi and G2, respectively. This is not always true. For instance, consider Figure 5. 
G has 3 elements and 1 connected component; Gi and G2 have 2 nodes and 1 connected 
component apiece. Therefore, the above inequality is not true in this case, and the cyclomatic 
number is not a complexity measure according to our definition. However, it can be shown that 
v(G)-p satisfies all the above complexity properties. From a practical perspective, especially in 
large systems, this correction does not have a significant impact on the value of the measure. 
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Figure 5. Control flow graph. 


Henry and Kafura [HK81] proposed an information flow complexity measure. In this context, 
elements are subprogram variables or parameters, modules are subprograms, relationships are 
either fan-in’s or fan-out's. For a subprogram SP, the complexity is expressed as length.(fan- 
in.fan-out) 2 , where fan-in and fan-out are, respectively, the local (as defined in [HK81]) 
information flows from other subprograms to SP, and from SP to other subprograms. Such local 
information flows can be represented as relationships between parameters/variables of SP and 
parameters/variables of the other subprograms. Subprograms' parameters/variables are the system 
elements and the subprograms' fan-in and fan-out links are the relationships. Any size measure can 
be used for length (in [HK81] LOC was used). The justification for multiplying length and (fan- 
in.fan-out) 2 was that "The complexity of a procedure depends on two factors: the complexity of the 
procedure code and the complexity of the procedure's connections to its environment." The 
complexity of the procedure code is taken into account by length-, the complexity of the 
subprogram’s connections to its environment is taken into account by (fan-in.fan-out) 2 . The 
complexity of a system is defined as the sum of the complexities of the individual subprograms. 
For the measure defined above, properties Complexity. 1 - Complexity. 4 hold. However, property 
Complexity.5 does not hold since, given two disjoint modules Si and S2 with a measured 
information flow of, respectively, lengthi.ifan-ini.fan-outi) 2 and length 2 .(fan-in 2 .fan-out 2 ) 2 , the 
following statement is true: 

length-(fan-inian-out) 2 > lengthi.(fan-ini.fan-outi) 2 + length2-(fan-in2.fan-out2) 2 

where length = lengthy + lengthy, fan-in = fan-ini + fan-in2, and fan-out = fan-outj + fan-out2- 

However, equality does not hold because of the exponent 2, which is not fully justified, 
and multiplication of fan-in and fan-out. Therefore, Henry and Kafura [HK81] information flow 
measure is not a complexity measure according to our definition. However, fan-in and fan-out 
taken as separate measures, without exponent 2, are complexity measures according to our 
definition since all the required properties hold. 

Similar measures have been used in [C90] and referred to as structural complexity (SC) and 
defined as: 


•an-out2(subroutine0 



Once again, property Complexity.5 does not hold because fan-out is squared in the formula. 

A metric suite for object-oriented design is proposed in [CK94]. A system is an object 
oriented design, modules are classes, elements are either methods or instance variables (depending 
on the measure considered) and relationships are calls to methods or uses of instance variables by 
other methods. These measures are validated against Weyuker’s properties for complexity 
measures, thereby implicitely implying that they were complexity measures. However, none of the 
measures defined by [CK94] is a complexity measure according to our properties: 
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Weighted Methods per Class (WMC) and Number Of Children of a class (NOC) are size 
measures (see Section 3.1); 

Depth of Inheritance Tree (DIT) is a length measure (see Section 3.2); 

Coupling Between Object classes (CBO) is a coupling measure (see Section 3.4); 

Response For a Class (RFC) is a size and coupling measure (see Sections 3. 1 and 3.3); 
Lack of Cohesion in Methods (LCOM) cannot be classified in our framework. This is 
consistent with what was said in the introduction: our framework does not cover all 
possible measurement concepts. 

This is not surprising. In [CK94], it is shown that all of the above measures do not satisfy 
Weyuker's property 9, which is a weaker form of property Complexity.4 (see Section 4). 


3.4. Cohesion 

Motivation 

The concept of cohesion has been used with reference to modules or modular systems. It assesses 
the tightness with which "related" program features are "grouped together" in systems or modules. 
It is assumed that the better the programmer is able to encapsulate related program features 
together, the more reliable and maintainable the system [F91]. This assumption seems to be 
supported by experimental results [BMB94(a)]. Intuitively, we expect cohesion to be non-negative 
and, more importantly, to be normalized (property Cohesion. 1) so that the measure is independent 
of the size of the modular system or module. Moreover, if there are no internal relationships in a 
module or in all the modules in a system, we expect cohesion to be null (property Cohesion.2) for 
that module or for the system, since, as far as we know, there is no relationship between the 
elements and there is no evidence they should be encapsulated together. Additional internal 
relationships in modules cannot decrease cohesion since they are supposed to be additional 
evidence to encapsulate system elements together (property Cohesion.3). When two (or more) 
modules showing no relationships between them are merged, cohesion cannot increase because 
seemingly unrelated elements are encapsulated together (property Cohesion.4). 

Since the cohesion (and, as we will see in Section 3.5, the coupling) of modules and entire 
modular systems have similar sets of properties, both will be described at the same time by using 
brackets and the alternation symbol T. For instance, the notation [AIB], where A and B are 
phrases, will denote the fact that phrase A applies to module cohesion, and phrase B applies to 
entire system cohesion. 

Definition 6: Cohesion of a [Module I Modular System] 

The cohesion of a [module m = <Em,R m > of a modular system MS I modular system MS] is a 
function [Cohesion(m)ICohesion(MS)] characterized by the following properties Cohesion. 1- 
Cohesion.4. 

0 


Property Cohesion .lx Non-negativity and Normalization 

The cohesion of a [module m = <Em,Rm> of a modular system MS = <EJt,M> I modular system 
MS = <E,R>1>] belongs to a specified interval 

[ Cohesion(m) e [0,Max] I Cohesion(MS) e [OJMax] ] (Cohesion.I) 

0 

Normalization allows meaningful comparisons between the cohesions of different 
[moduleslmodular systems], since they all belong to the same interval. 
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Property Cohesion 2: Null Value. 

The cohesion of a [module m = <Em,R m > of a modular system MS = <E>R,M> I modular system 
MS = <E,RJM>] is null if [R^HR] is empty 


[ R m = 0 => Cohesion(m) = 0 1 IR = 0 => Cohesion(MS) = 0 ] (Cohesion JI) 


(Recall that IR is the set of intra-module relationships, defined in Definition 2.) 


0 


If there is no intra-module relationship among the elements of a (all) module(s), then the module 
(system) cohesion is null. 


Property Cohesion.3: Monotonicity. 

Let MS' = <E r R',M'> and MS" = <E,R",M"> be two modular systems (with the same set of 
elements E) such that there exist two modules m' = <Em>Rm’> and m" = <E m ,R m >’> (with the 
same set of elements Em) belonging to M' and M" respectively, such that R' - R^ = R” - Rm", and 
Rm' E Rm" (which implies IR' c IR”). Then, 

[ Cohesk>n(m')<Cohesion(m " ) 1 Cohesion(MS O^CohesionQvIS " ) ] (CohesionJH) 

0 


Adding intra-module relationships does not decrease [moduleimodular system] cohesion. For 
instance, suppose that systems S, S', and S" in Figure 3 are viewed as modular systems MS = 
<E,R,M>, MS’ = <E',R',M'>, and MS" = <E",R\M"> (with M = {mi,m2,m 3 }, M' = 
{m'i,m , 2,m , 3}, and M” = {m"i,m"2,m w 3}). We have [Cohesion(m’3) > Cohesion(m3) I 
Cohesion(MS’) > Cohesion(MS)]. 

Property Cohesion. 4; Cohesive Modules . 

Let MS' = <EJLM'> and MS" = <EJLM"> be two modular systems (with the same underlying 
system <E,R>) such that M" = M' - {m’i,m'2} u {m"J, with m'i e M', m'2 e M, m" e M’, and 
m" = m'i u m'2. (The two modules m'i and m'2 are replaced by the module m", union of m'i and 
m'2.) If no relationships exist between the elements belonging to m’i and m’2, Le., InputR(m'i) n 

OutputR(m'2) = 0 and InputR(m'2) n OutputR(m’i) = 0, then 

[ max{Cohesion(m'i),Cohesion(m'2)} ^ Cohesion(m") I 

Cohesion(MS') > Cohesion(MS") ] (CohesionJV) 

0 


The cohesion of a [moduleimodular system] obtained by putting together two unrelated modules is 
not greater than the [maximum cohesion of the two original modifieslthe cohesion of the original 
modular system]. 

Properties Cohesion. 1 - Cohesion.4 hold when applying the admissible transformation of the ratio 
scale. Therefore, there is no contradiction between our concept of cohesion and die definition of 
cohesion measures on a ratio scale. 


Examples of cohesion measures 

In [BMB94(a)], cohesion measures for high-level design are defined and validated, at both the 
abstract data type (module) and system (program) levels. For brevity's sake, the term software part 
here denotes either a module or a program. A high-level design is seen as a collection of modifies, 
each of which exports and imports constants, types, variables, and procedures/functions. A widely 
accepted software engineering principle prescribes that each module be highly cohesive, Le., its 
elements be tightly related to each other. [BMB94(a)] focuses on investigating whether high 
cohesion values are related to lower error-proneness, due to the fact that the changes required by a 
change in a module are confined in a well-encapsulated part of the overall program. To this end. 


4-42 


SEL-95-003 



the exported feature A is said to interact with feature B if the change of one of A's definitions or 
uses may require a change in one of B's definitions or uses. 

fa the approach of die present paper, each feature exported by a module is an element of die 
system, and the interactions between diem are the relationships between elements. A module 
according to [BMB94(a)3 is represented by a module according to the definition of the present 
paper. At high-level design time, not all interactions between the features of a module are known, 
since the features may interact in the body of a module, and not in its visible part Given a software 
part sp, three cohesion measures NRCI(sp), PRCI(sp), and ORCI(sp) (respectively. Neutral, 
Pessimistic, and Optimistic Ratio of Cohesive Interactions) are defined for software as follows 

. #KnownInteractions(sp) 

t-M.sp; - #MaxInteractions(sp)-#UnknownInteractioiis(sp) 

ppnr % #KnownInteractions(sp) 

(spi - #M a xInteractioiis(sp) 


ORCI(sp) = 


#KnownInteractions(sp)4#UnknownInteractions (sp) 
#MaxInteractions(sp) 


where #MaxInteractions(sp) is the maximum number of possible intra-module interactions between 
the features exported by each module of the software part sp. (Inter-module interactions are not 
considered cohesive; they may contribute to coupling, instead.) All three measures satisfy the 
above properties Cohesion. 1 - Cohesion.4. 

Other examples of cohesion measures can be found in [B094], where new functional 
cohesion measures are introduced. Given a procedure, function, or main program, only data 
tokens (Le., the occurrence of a definition or use of a variable or a constant) are taken into account 
The data slice for a data token is the sequence of all those data tokens in the program that can 
influence the statement in which the data token appears, or can be influenced by that statement 
Being a sequence, a data slice is ordered: it lists its data tokens in order of appearance in the 
procedure, function or main program. If more than one data slice exists, some data tokens may 
belong to more than one data slice: these are called glue tokens. A subset of the glue tokens may 
belong to all data slices: these are called super-glue tokens. Functional cohesion measures are 
defined based on data tokens, glue tokens, and super-glue tokens. This approach can be 
represented in our framework as follows. A data token is an dement of the system, and a data slice 
is represented as a sequence of nodes and arcs. The resulting graph is a Directed Acyclic Graph, 
which represents a module. ([B094] introduces functional cohesion measures for single 
procedures, functions, or main programs.) Given a procedure, function, or main program p, the 
following measures SFC(p) (Strong Functional Cohesion), WFC(p) (Weak Functional Cohesion), 
and A(p) (adhesiveness) are introduced 


r,™, , #SuperGlueTokens 
SFC W #AHTokeps 


WFC(p) = 


#GlueTokens 

#A31Tokens 


X#SlicesContainingGlueTokenGT 

A / _ GTE QaeTokens 

~ #A13Tokens.#DataSlices 

It can be shown that these measures satisfy the above properties Cohesion. 1 - Cohesion.4. 
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3.5. Coupling 

Motivation 

The concept of coupling has been used with reference to modules or modular systems. Intuitively, 
it captures the amount of relationship between the elements belonging to different modules of a 
system. Given a module m, two kinds of coupling can be defined: inbound coupling and outbound 
coupling. The former captures the amount of relationships from elements outside m to elements 
inside m; the latter the amount of relationships from elements inside m to elements outside m. 

We expect coupling to be non-negative (property Coupling. 1), and null when there are no 
relationships among modules (property Coupling^). When additional relationships are created 
across modules, we expect coupling not to decrease since these modules become more 
interdependent (property Coupling.3). Merging modules can only decrease coupling since there 
may exist relationships among them and therefore, inter-module relationships may have 
disappeared (property Coupling.4, property Coupling^). 

In what follows, when referring to module coupling, we will use the word coupling to 
denote either inbound or outbound coupling, and OuterR(m) to denote either InputR(m) or 
OutputR(m). 

Definition 7: Coupling of a [Module 1 Modular System ] 

The coupling of a [module m = <E m ,R m > of a modular system MSImodular system MS] is a 
function [Coupling(m)ICoupling(MS)] characterized by the following properties Coupling. 1 - 
Coupling.5. 

0 


Property Coupling.l : Non-negativity 

The coupling of a [module m = <Em,R m > of a modular systemlmodular system MS] is non- 
negative 

[ Coupling(m) > 0 I Coupling(MS) > 0 ] (Coupling.!) 

0 


Property Coupling.2 : Null Value 

The coupling of a [module m = <E m ,R m > of a modular systemlmodular system MS = <E,R,M>] 
is null if [OuterR(m)lR-IR] is empty 

[ OuterR(m)=0 => Coupling(m)=0 1 R-IR=0 => Coupling(MS)=0 ] (Coupling.il) 

o 

Property Coupling.3 : Monotonicity 

Let MS' = <E,R',M’> and MS" = <E,R",M"> be two modular systems (with the same set of 
elements E) such that there exist two modules m' e M', m" e M" such that R' - OuterROn 1 ) = R" - 
OuterR(m"), and OuterROn 1 ) c OuterR(nT). Then, 

[ CouplingCmO^CouplingOn") I Coupling(MS ')<Coupling(MS " ) ] (CouplingiD) 

0 

Adding inter-module relationships does not decrease coupling. For instance, if systems S, and S" 
in Figure 3 are viewed as modular systems (see Section 3.4), we have [Coupling(m"j) > 
Coupling(mi) I Cohesion(MS") > Cohesion(MS)]. 

Property Coupling.4 : Merging of Modules 

Let MS' = <E',R',M'> and MS" = <E",R",M"> be two modular systems such that E' = E", R' = 
R”, and M" = M’ - {m'i,m'2} u {m"}, where rn'i = <E m , i,R m 'l>> m' 2 = <E m , 2,Rm’2>. and m" 
= <E m ",R m ">, with m'i e M', m'2 e M', m" e M', and Em" = Em’i v Em’2 and Rm" = R m ’i u 


4-44 


SEL-95-003 



R m ' 2 - (The two modules m'i and m'2 are replaced by the module m", whose elements and 
relationships are the union of those of m'i and m'2.) Then 

[ Coupling(m'i) + Coupling(m’2 ) ^ CouplingCm") I 

Coupling(MS') > Coupling(MS") ] (CouplingJV) 

0 

The coupling of a [moduleimodular system] obtained by merging two modules is not greater than 
the [sum of the couplings of die two original moduleslcouplmg of the original modular system], 
since the two modules may have common inter-module relationships. For instance, suppose that 
the modular system MS12 in Figure 6 is obtained from the modular system MS in Figure 2 by 
merging modules mi and m2 into module mi2- Then, we have [Coupling(mi) + Coupling(m2) > 
Couphng(mi2) I Coupling(MS) > Coupling(MSi2)]. 



Figure 6 . The effect of merging modules on coupling. 

Property Coupling.S : Disjoint Module Additivity 

Let MS’ = <EJRJM’> and MS" = <E,R,M"> be two modular systems (with the same underlying 
system <E,R>) such that M" = M’ - {m'i,m'2} u {m"}, with m’i e M’, m’2 e M', m" e M', and 
m" = m'i u m'2- (The two modules m'i and m'2 are replaced by the module m", union of m'i and 
m'2.) If no relationships exist between the elements belonging to m’i and m'2, i.e., IhputR(m'i) n 

OutputR(m'2> = 0 and InputR(m'2) n OutputR(m'i) = 0 * then 

[ Coupling(m'i) + Coupling(m'2) = Coupling(m") I 

Coupling(MS’) = Coupling(MS") ] (Coupling. V) 

0 

The coupling of a [moduleimodular system] obtained by merging two unrelated modules is equal to 
the [sum of die couplings of the two original moduleslcoupling of the original modular system]. 

Properties Coupling.l - Coupling .5 hold when applying the admissible transformations of the ratio 
scale. Therefore, there is no contradiction between our concept of coupling and the definition of 
coupling measures on a ratio scale. 

Examples and counterexamples of coupling measures 

Fenton has defined an ordinal coupling measure between pairs of subroutines [F 91 ] as follows: 

C(S,S’) = i + -^T 
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where i is the number corresponding to the worst coupling type (according to Myers' ordinal scale 
[F91]) and n the number of interconnections between S and S', Le., global variables and formal 
parameters. In this case, systems are programs, modules are subroutines, elements are formal 
parameters and global variables. If coupling for the whole system is defined.as the sum of coupling 
values between all subroutine pairs, properties Coupling. 1 - Coupling.5 hold for this measures and 
we label it as a coupling measure. However, Fenton proposes to calculate the median of all the pah- 
values as a system coupling measure. In this case, property Coupling.3 does not hold since the 
median may decrease when inter-module relationships are added. Similarly for Coupling.4, when 
subroutines are merged and inter-module relationships are lost, the median may increase. 
Therefore, the system coupling measure proposed by Fenton is not a coupling measure according 
to our definitions. 

In [BMB94(a)], coupling measures for high-level design are defined and validated, at both 
the module (abstract data type) and system (program) levels. They are based on the notion of 
interaction introduced in the examples of Section 3.4. Import Coupling of a module m is defined as 
the extent to which m depends on imported external data declarations. Similarly, export coupling of 
m is defined as the extent to which m's data declarations affect the other data declarations in the 
system. At the system level, coupling is the extent to which the modules are related to each other. 
Given a module m. Import Coupling of m (denoted by IC(m» is the number of interactions 
between data declarations external to m and die data declarations within m. Given a module m. 
Export Coupling of m (denoted by EC(m)) is the number of interactions between the data 
declarations wi thin m and the data declarations external to m. As shown in [BMB94(a)], our 
coupling properties hold for these measures. 

Coupling Between Object classes (CBO) of a class is defined in [CK94] as the number of 
other classes to which it is coupled. It is a coupling measure. Properties Coupling. 1 and 
Coupling.2 are obviously satisfied. Property Coupling.3 is satisfied, since CBO cannot decrease 
by adding one more relationship between features belonging to different classes (i.e., one class 
uses one more method or instance variable belonging to another class). Property Coupling.4 is 
satisfied: CBO can only remain constant or decrease when two classes are grouped into one. 
Property Coupling.4 is also satisfied. 

Response For a Class (RFC) [CK94] is a size and a coupling measure at the same time (see 
Section 3.1). Methods are elements, calls are relationships, classes are modules. Coupling.3 
holds, since adding outside method calls to a class can only increase RFC and Coupling.4 holds 
because merging classes does not change RFC's value since RFC does not distinguish between 
inside and outside method calls. Similarly, when there are no calls between the classes' methods, 
Coupling.5 holds. This result is to be expected since RFC is the result of the addition of two terms: 
the number of methods in the class, a size measure, and the number of methods called, a coupling 
measure. 


3.6. Comparison of Concept Properties 

We want to summarize the important differences and similarities between the system concepts 
introduced in this paper. Table 1 uses only criteria that can be compared across the concepts of 
size, length, complexity, cohesion, and coupling. First, it is important to recall that coupling and 
cohesion are only defined in the context of modular systems, whereas size, length and complexity 
are defined for all systems. 

Second, the concepts appear to have the null value (second column) and monotonicity 
(third column) properties basal on different sets. The behavior of a measure with respect to 
variations in such sets characterizes the nature of the measure itself, Le., the concepts) it captures. 
As RFC, defined in [CK94], shows (see Sections 3.1 and 3.5), the same measure may satisfy the 
sets of properties associated with different concepts. As a matter of fact, similar sets of properties 
associated with different concepts are not contradictory. 

Third, when systems are made of disjoint modules, size, complexity and coupling are 
additive (properties Size.3, Complexity.5, and Coupling.5). Cohesion and length are not additive. 
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Concepts\Properties 


liViTITtliiiiilni k'JMMiUfi 

Additivity 

Size 

E = 0 

E 

Yes 

Length 

E=0 

R 

No 

Complexity 

R = 0 

R 

Yes 

System Cohesion 

IR=0 

IR 

No 

System Coupling 

R-IR=0 

R-IR 

Yes 


Table 1: Comparison of concept properties 


Ibis summary shows that these concepts are really different with respect to basic properties. 
Therefore, it appears that desirable properties are likely to vary from one measurement concept to 
another. 


4. Comparison with Related Work 

We mainly compare our approach with the other approaches for defining sets of properties for 
software complexity measures, because they have been studied more extensively and thoroughly 
than other kinds of measures. Besides, we compare our approach with the axioms introduced by 
Fenton and Melton [FM90] for software coupling measures. As already mentioned, our approach 
generalizes previous work on properties for defining complexity measures. Unlike previous 
approaches, it is not constrained to deal with software code only, but, because of its generality, 
can be applied to other artifacts produced during the software lifecycle, namely, software 
specifications and designs. Moreover, it is not defined based on some control flow operations, like 
sequencing or nesting, but on a general representation, Le., a graph. 

Weyuker 3 

Weyuker's work [W88] is one of the first attempts to formalize the fuzzy concept of program 
complexity. Hus work has been discussed by many authors [CK94, F91, LJS91, TZ92, Z91] and 
is still a point of reference and comparison for anyone investigating the topic of software 
complexity. 

To make Weyuker's properties comparable with ours, we will assume that a program 
according to Weyuker is a system according to our definition; a program body is a module of a 
system. A whole program is built by combining program bodies, by means of sequential, 
conditional, and iterative constructs (plus the program and output statements, which can be seen as 
"special" program bodies), and, correspondingly, a system can be built from its constituent 
modules. Since some of Weyuker's properties are based on the sequencing between pairs of 
program bodies P and Q, we provide more details about the representation of sequencing in our 
framework. Sequencing of program bodies P and Q is obtained via the composition operation 
(P;Q). Correspondingly, if Sp = <Ep,Rp> and Sq = <Eq,Rq> are the modules representing the 
two program bodies P and Q 4 , then, we will denote the representation of P;Q as Sp ; g = 
<Ep;Q,Rp;Q>. In what follows, we will assume that Ep ; q = Ep u Eq and Rp;Q Rp u RQ,i.e., 
the representation of the composition of two program bodies contains the elements of the 
representation of each program body, and at least contains all the relationships belonging to each of 
the representations of program bodies. In other words, Sp and Sq are modifies of Sp-,Q. 


*We will list properties/axioms by the initial of the proponents. So, Weyuker’s properties will be referred to as Wl, 
W2, W9, Tian and Zdkowitz's asTZlto TZ5, and Lakshmanian et alii’s as LI to L9. 

4 hi what follows, we will use the notation Sp = <Ep,Rp> to denote the representation of program body P. 
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Wl: A complexity measure must not be "too coarse " (1). 

3 Sp, Sq Complexity(Sp) ^ Complexity(SQ) 

W2: A complexity measure must not be "too coarse" (2). Given the nonnegative number c, there 
are only finitely many systems of complexity c. 

W3: A complexity measure must not be "too fine. " There are distinct systems Sp and Sq such that 
Complexity(Sp) = Complexity(SQ). 

W4: Functionality. There is no one-to-one correspondence between functionality and complexity 
3 Sp,Sq P and Q are functionally equivalent and Complexity(Sp) * Complexity(SQ) 

W5: Monotonicity with respect to composition. 

V Sp,Sq 

Complexity(Sp) < Complexity(Sp^) and Complexity(SQ) < Complexity(Sp;Q) 

W6: The contribution of a module in terms of the overall system complexity may depend on die 
rest of the system. 

(a) 3 Sp, Sq, St Complexity(Sp) = Complexity(SQ) and Complexity(Sp;T) * Complexity(SQ ; T) 

(b) 3 Sp, Sq, St Complexity(Sp) = Complexity(SQ) and Complexity(ST^p) * Complexity(ST^) 

W7: A complexity measure is sensitive to the permutation of statements. 

3 Sp, Sq Q is formed by permuting the order of statements of P and Complexity(Sp) * 
Complexity(SQ) 

W8: Renaming. If P is a renaming of Q, then Complexity(Sp)=€omplexity(SQ). 

W9: Module monotonicity. 

3 Sp, Sq Complexity(Sp) + Complexity(SQ) < Complexity(Sp;Q) 

Analysis ofWeyuker's properties 

Wl, W2, W3, W4, W8: These are not implied by our properties, but they do not contradict 
any of them, so they can be added to our set, if desired. However, we think that these properties 
are general to all syntactically-based product measures and do not appear useful in our framework 
to differentiate concepts. 

W5: This is implied by our properties, as shown by inequality (Complexity. VI), since Sp and 

Sq are modules of Sp;Q. 

W6, W7: These properties are not implied by the above properties Complexity. 1 - 

Complexity.5. However, they show a very important and delicate point in the context of 
complexity measure definition. 

By assuming properties W6(a) and W6(b) to be false, one forces all complexity measures 
to be strongly related to control flow, since this would exclude that die composition of two 
program bodies may yield additional relationships between elements (e.g., data declarations) of the 
two program bodies. If properties W6(a) and W6(b) are assumed true, one forces all complexity 
measures to be sensitive to at least one other kind of additional relationship. 

Similarly, W7 states that the order of the statements, and therefore the control flow, should 
have an impact on all complexity measures. By assuming property W7 to be false, one forces all 
complexity measures to be insensitive to die ordering of statements. If property W7 is assumed 
true, one forces all complexity measures to be somehow sensitive to the ordering of statements, 
which may not always be useful. 
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W8: We analyze this property again, to better explain the relationship between complexity and 
understandability. According to this property, renaming does not affect complexity. However, it is 
a fact that renaming program variables by absurd or misleading names greatly impairs 
understandability. This shows that other factors, besides complexity, affect understandability and 
the other external qualities of software that are affected by complexity. 

As for properties W1-W8, our approach is somewhat more liberal than Weyuker's. For 
instance, the constant null function is an acceptable complexity measure according to our 
properties, while it is not acceptable according to Weyuker's properties. It is evident that the 
usefulness of such a complexity measure is questionable. We think that properties should be used 
to check whether a measure actually addresses a given concept (e.g., complexity). However, given 
any set of properties, it is almost always possible to build a measure that satisfies them, but is of 
no practical interest (see [CS91]). At any rate, this is not a sensible reason to reject a set of 
properties associated with a concept (how many sensless measures could be defined that satisfy the 
three properties that characterize distance!). Rather, measures that satisfy a set of properties must 
be later assessed with regard to their usefulness. 

W9: This is probably the most controversial property. The above properties Complexity. 1 - 
Complexity.5 imply it Actually, our properties imply the stronger form of W9, the unnumbered 
property following W9 in Weyuker's paper [W88] (see also [P84]) 

V Sp, Sq Complexity(Sp) + Complexity(SQ) < Complexity(Sp;Q) 

Weyuker rejects it on the basis that it might lead to contradictions: she argues that the effort needed 
to implement or understand the composition of a program body P with itself, is probably not twice 
as much as the effort needed for P done. Our point is that complexity is not the only factor to be 
taken into account when evaluating the effort needed to implement or understand a program, nor is 
it proven that this effort is in any way "proportional" to product complexity. 

Fenton 

In addition to Weyuker's work, Fenton [F94] shows that, based on measurement-theoretic 
mathematical grounds, there is no chance that a general measure for software complexity will ever 
be found, nor even for control flow complexity, Le., a more specific kind of complexity. We 
totally agree with that By no means do we aim at defining a single complexity measure, which 
captures all kinds of complexity in a software artifact Instead, our set of properties define 
constraints for any specific complexity measure, whatever facet of complexity it addresses. 

Fenton and Melton [FM90] introduced two axioms that they befieve should hold for 
coupling measures. Both axioms assume that coupling is a measure of connectivity of a system 
represented by its module design chart (or structure chart). The first axiom is similar to our 
monotonidty property (Coupling.3). It states that if the only difference between two module 
design charts D and D' is an extra interconnection in D', then the coupling of D' is higher than the 
coupling of D. The second axiom basically states that system coupling should be independent from 
the number of modules in the tystem. If a module is added and shows the same level of pairwise 
coupling as the already existing modules, then the coupling of the system remains constant. 
According to our properties, coupling is seen as a measure which is to a certain extent dependent 
on the number of modules in the system and we therefore do not have any equivalent axiom. This 
shows that die sets of properties that can be defined above are, to some extent, subjective. 

Zuse 

In his article in the Encyclopaedia of Software Engineering [ESE94 pp. 131-165], Zuse applies a 
measurement-theoretic approach to complexity measures. The focus is on the conditions that 
should be satisfied by empirical relational systems in order to provide them with additive ratio scale 
measures. This class of measures is a subset of ratio scale measures, characterized by the additivity 
property (Theorems 2 and 3 of [ESE94]). Given the set P of fiowgraphs and a binary operation * 
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between flowgraphs (e.g., concatenation), additive ratio scale complexity measures are such that, 
for each pair of flowgraphs PI, P2, 

Complexity(Pl*P2) = Complexity(Pl) + Complexity(P2) 

This property shows that a different concept of complexity is defined by Zuse, with respect to that 
defined by Weyuker's (W9) and our properties (Complexity.4). It is our belief that, by requiring 
that complexity measures be additive, important aspects of complexity may not be fully captured, 
and complexity measures actually become quite similar to size measures. Considering complexity 
as additive means that, when two modules are put together to form a new system, no additional 
dependencies between the elements of the modules should be taken into account in the computation 
of the system complexity. We believe this is a very questionable assumption for product 
complexity. 

Tian and Zelkowitz 

Han and Zelkowitz [TZ92] have provided axioms (necessary properties) for complexity measures 
and a classification scheme based on additional program characteristics that identify important 
measure categories. In the approach, programs are represented by means of their abstract syntax 
trees (e.g., parse trees). To translate this representation into our framework, we will assume that 
the whole program, represented by the entire tree, is a system, and that any part of a program 
represented by a subtree is a module. 

TZ1: Systems with identical functionality are comparable, Le., there is an order relation between 
them with respect to complexity. 

TZ2: A system is comparable with its module(s). 

TZ3: Given a system Sq and any module Sp whose root, in the abstract tree representation, is ‘Tar 
enough” from the root of Sq, that Sp is not more complex than Sq. In other words, "small" 
modules of a system are no more complex than the system. 

TZ4: If an intuitive complexity order relation exists between two systems, it must be preserved by 
the complexity measure (it is a weakened form of the representation condition of Measurement 
Theory [F91]). 

TZ5: Measures must not be too coarse and must show sufficient variability. 

TZ1, TZ2, TZ5 do not differentiate software characteristics (concepts) and can be used for all 
syntactic product measures. TZ3 can be derived from our set of properties. TZ4 captures the basic 
purpose behind the definition of all measures: preserving an intuitive order on a set of software 
artifacts [MGB90]. 

The additional set of properties which is presented in [TZ92] is used to define a measure 
classification system. It determines whether or not a measure is based exclusively on the abstract 
syntax tree of the program, whether it is sensitive to renaming, whether it is sensitive to the context 
of definition or use of the measured program, whether it is determined entirely by the performed 
program operations regardless of their order and organization, and whether concatenation of 
programs always contribute positively toward the composite program complexity (ie., system 
monotonidty). 

Some of these properties are related to the properties defined in this paper and we believe 
they are characteristic properties of distinct system concepts (e.g., system monotonicity). Others 
do not differentiate the various concepts associated with syntactically-based measures (e.g., 
renaming). 

Lakshmanian et al. 

Lakshmanian et aL [US91] have attempted to define necessary properties for software complexity 
measures based on control flow graphs. In order to make these properties comparable to ours, we 
wifi use a notation similar to the one used to introduce Weyuker's properties. A program according 


4-50 


SEL-95-003 



to T^Tcshmanian et al. (represented by a control flow graph) is a system according to our definition, 
and a program segment is a module. In addition to sequencing, these properties use the nesting 
program construct denoted as "A program segment Z is said to be obtained by nesting 
[program segment] Y at the control location i in [program segment] X (denoted by Y@Xi) if the 
program segment X has at least one conditional branch, and if Y is embedded at location i in X in 
such a way that there exists at least one control flow path in the combined code Z that completely 
skips Y." "The notation Y @X refers to any nesting of Y in X if the specific location in X at which 
Y is embedded is immaterial." 

In what follows, X, Y, Z will denote programs or program segments; Sx, Sy, Sz will 
denote the corresponding systems or modules according to our definition. I.^kshmanian et aL 
[US91] introduce nine properties. However, only five out of them can be considered basic, since 
the re mainin g four can be derived from them. Therefore, below we will only discuss the 
compatibility of the basic properties with respect to our properties. 

LI: Non-negativity. 

Ll(a): Null value. 

If the program only contains sequential code (referred to as a basic block B) then 

Complexity(SB) = 0 

Ll(b): Positivity. 

If the program X is not a basic block, then 

Complexity(Sx) > 0 

0 

Property LI does not contradict any of our properties (in particular. Complexity 1 and Complexity 

2 ). 

L5: Additivity under sequencing. 

Complexity(Sx;Y) = Complexity(SY) + Complexity(Sx) 

0 

This property does not contradict properties Complexity.4 and Complexity.5, where the equality 
sign is allowed. By requiring that complexity be additive under sequencing, lakshmanian et al take 
a viewpoint which is very similar to that of Zuse. 

L6: Functional independence under nesting. 

Adding a basic block B to a system X through nesting does not increase its complexity 
Complexity(SB@x) =Complexity(Sx) 

0 


L7: Monotonicity under nesting. 

Complexity(SY@Xi) < Complexity(Sz@Xi) if ComplexityCSy) < Complexity(Sz) 

0 


These properties are compatible with our properties. 

L9: Sensitivity to nesting. 

Complexity(Sx;Y) < Complexity(SY@x) if Complexity(Sy) > 0 

0 
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This property does not contradict our properties. 

In conclusion, none of the above properties contradicts our properties. However, the scope of 
these properties is limited to the sequencing and nesting of control flow graphs, and therefore to 
the study of control flow complexity. 

As for the other properties, we now show how they can be derived from LI, L5, L6, L7, and L 9. 

L2: Functional independence under sequencing. 

Complexity(Sx^B) = Complexity(Sx) 

This property follows from L5 (first equality below) and LI (second equality below): 
Complexity(Sx#) = Complexity(Sx) + Complexity(Ss) = Complexity(Sx) 


0 


L3: Symmetry under sequencing. 

Complexity(Sx;Y) = Complexity(SYpS 

This property follows from L5 (both equalities) 

Complexity(Sx;Y) = Complexity(Sx) + Complexity(Sy) = Complexity(SYpc) 

0 


L4: Monotonicity under sequencing. 

Complexity(Sx;Y) < Complexity(Sx;z) if Complexity(SY) < Complexity(Sz) 
Complexity(Sx;Y) = Complexity(Sx;Z) if Complexity(SY) = Complexity(Sz) 

This property follows from L5: 

if Complexity(SY) < Complexity (Sz), then 
Complexity(Sx;Y) = Complexity(Sx) + Complexity(Sy) 

< Complexity(Sx) + Complexity(Sz) = ComplexitvfS y ; 7) 
if Complexity(SY) = Complexity(Sz), then 
Complexity(Sx;Y) = Complexity(Sx) + Complexity(SY) 

= Complexity(Sx) + Complexity(S2i) = Complexity(Sx ) z) 


0 


L8: Monotonicity under nesting. 

Complexity(SY) < Complexity(SY@^ 

This property follows from LI (first inequality below, since Complexity(Sx)>0 — X cannot be a 
basic block), L5 (equality below) and IS (second inequality below) 

Complexity(SY) < Complexity(Sx) + Complexity(SY) 

= Complexity(Sx;Y) < Complexity(SY@x) 


0 
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5. Conclusion and Directions for Future Work 


In order to provide some guidelines for the analyst in charge of defining product measures, we 
propose a framework for software measurement where various software measurement concepts are 
distinguished and their specific properties defined in a generic manner. Such a framework is, by its 
very nature, somewhat subjective and there are possible alternatives to it However, it is a practical 
framework since the properties we capture are, we believe, interesting and all the concepts can be 
distinguished by different sets of properties. 

For example, these properties can be used to guide the search for new product measures as 
shown in [BMB94(b)]. Moreover, we hope this framework will help avoid future confusion, often 
encountered in die literature, about what properties product measures should or should not have. 
Studying measure properties is important in order to provide discipline and rigor to the search for 
new product measures. However, die relevancy of a property to a given measure must be assessed 
in die context of a well defined measurement concept, e.g., one Should not attempt to verify if a 
length measure is additive. 

This framework does not prevent useless measures from being defined. The usefulness of a 
measure can only be assessed in a given context (Le., with respect to a given experimental goal and 
environment) and after a thorough experimental validation [BMB94(b)]. This framework is not a 
global answer to die problems of software engineering measurement; it is just of the necessary 
components of a measure validation process as presented in [BMB94(b)]. 

Future research will include the definition of more specific measurement frameworks for 
particular product abstractions, e.g., control flow graphs, data dependency graphs. Also, new 
concepts could be defined, such as information content (in the information theory sense). 
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Abstract 

Component reuse is widely considered vital for obtaining significant improvement 
in development productivity. However, as an organization adopts a reuse-oriented 
development process, the nature of the problems in development is likely to change. In 
this paper, we use a measurement-based approach to better understand and evaluate 
an evolving reuse process. More specifically, we study the effects of reuse across seven 
projects in narrow domain from a single development organization. An analysis of the 
errors that occur in new and reused components across all phases of system development 
provides insight into the factors influencing the reuse process. We found significant 
differences between errors associated with new and various types of reused components 
in terms of the types of errors co mmi tted, when errors axe introduced, and the effect 
that the errors have on the development process. 


1 Introduction 


Reuse has been advocated as a technique with great potential to increase software 
development productivity, reduce development cycle time, and improve product quality 
[AM87, Bro87, BP88]. However, reuse will not just happen-rather, components must be 
designed for reuse, and organizational elements must be in place to enable projects to take 
advantage of the reusable artifacts. 

Basili and Rombach present a framework of comprehensive support for reuse, including 
organizational and methodological properties necessary to maximize the benefit of reuse 
[BR91]. For reuse to attain a significant role in an environment, organizational changes 
must be made to facilitate the change in development style. Maintaining a library of reusable 
parts may require resources including personnel, hardware, and software. While increasing 
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the amount of reuse in an environment may reduce certain development activities (e.g., 
code creation), it will also require additional effort in other activities (e.g., searching for 
components). With respect to product quality, it is also clear that “reused” does not imply 
“defect-free.” An investigation into the benefits of reuse in the NASA Goddard Space Flight 
Center (NASA/ GSFC) showed that even among components that were intended to be reused 
verbatim, while their error rate was an order of magnitude lower than newly created code, 
the error rate is still significant [TDB92]. By analyzing the nature of the defects in the reuse 
process, one can tailor the process appropriately to best achieve the organization’s goals. 

There have been several studies into techniques to stock an initial reuse library [CB91, 
DK93]. One factor to be considered is the structure of the candidate reusable component. 
Selby investigated various characteristics of new versus reused code in a large collection 
of FORTRAN projects [Sel88] . Basili and Pemcone analyzed tradeoffs between creating a 
component from scratch versus modifying an existing component [BP84]. This work extends 
these studies by investigating the nature of errors occurring in a reuse oriented develop- 
ment environment, and drawing conclusions as to their impact in such an environment. In 
particular, we analyzed a coEection of eight medium scale Ada projects developed over a 
five year period in the NASA/GSFC with respect to the defects found in newly developed 
and reused components. The goal of the study was to learn about the nature of problems 
associated -with reuse-oriented software development, thereby allowing for improvement of 
the reuse process. We found significant differences between errors associated with new and 
with various types of reused components in terms of when errors are being introduced, the 
effect that they have on the development process, and the type of error being committed. 
We also found some similarites and some differences with the findings of other investigations 
into component reuse. 

This paper is organized as foUows. Section 2 provides a brief overview of reuse-oriented 
software development, while section 3 gives background about using error analysis for process 
improvement. Section 4 describes the goals of the study and the data analyzed. The findin gs 
from our analysis are presented in section 5, and section 6 summarizes and identifies the 
major conclusions. 


2 Reuse- Oriented Software Development 


Reuse has been cited as a technology with the potential to provide a significant increase 
in software development productivity and quality. For example, Jones estimates that only 
15 percent of the developed software is unique to the applications for which it was developed 
[Jon84]. Reduced development cost is not the only benefit of reuse-in fact, the greatest 
benefit from reuse may be its impact on maintenance [LG84, Rom91]. The potential for 
substantial savings from reuse clearly exists. Unfortunately, achieving high levels of reuse 
stiE remains an difficult task. A number of issues must be addressed to effectively increase 
the level of reuse in an organization, including the forms of reuse, and language and organi- 
zational support to encourage reuse. 
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2.1 Types of Reuse 


In tiiis study we examined three modes of reuse: 

• verbatim reuse, in which the component is unchanged, 

• reuse with slight modification, in which the original component is slightly tailored for 
the new application, 

• reuse with extensive modification, in which the original component is extensively al- 
tered for the new application. 

While differentiating verbatim reuse and reuse via modification is trivial, distinguishing 
between slight modification and extensive modification is more difficult. Our intent is to 
distinguish between cases where a component is left essentially intact, but needs some small 
change for the new application, and cases where a component is significantly altered for its 
new use. The three types of reuse, and a their expected impact on development are described 
in the following paragraphs. 

Intuitively, verbatim reuse appears to hold the greatest benefit to software development. 
Development effort is minimized and verification effort is reduced, since the component has 
previously been developed, tested, and used. There may be an increased cost in integration 
effort, as the reused component may not squarely fit in the new system, and the develop- 
ers may not be as familiar with the reused component as they would be with a custom 
component. 

Another means of reuse is achieved by slight modification of an existing component. 
Here a component remains for the most part unchanged, but is adapted slightly for the new 
application. For example, a sort routine may be modified to sort a different type of objects. 
An improvement in terms of reduced development effort and increased quality is expected, 
although perhaps not to the same degree as in the reused verbatim components. Again, 
the integration of modified components may be more difficult than that of newly created 
components; but, because the modified components may be adapted to better match the 
application, the integration is perhaps not as difficult as with the verbatim reused com- 
ponents. As with verbatim reuse, there may be new errors introduced in the component 
selection process. However, since the developer does have a greater understanding of the 
implementation of the modified component, one is more likely to detect that error earlier 
than if the component was reused verbatim. 

Our third category of reuse occurs through extensive modification of an existing com- 
ponent. For example, one may want to change the underlying representation of a particular 
type while maintaining the operations on the type. If the component was not designed with 
the representation isolated in the implementation, this may require changes throughout the 
component. Reuse in this manner is likely to be beneficial only if the component is of a 
sufficient size and complexity to justify modification as opposed to simply creating a new 
component from scratch. Since much of the component is new, in many ways this type of 
reuse may appear similar to new development. However, there are some important distinc- 
tions. The number of coded lines is likely to be reduced relative to newly developed code, so 
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one might expect a decrease in error density. However, the extensive modification activity 
may be more error prone than standard component creation, since the original abstraction is 
being significantly altered. This mode of component creation may restdt in more of a “hack” 
than a well-conceived component. New types of errors may arise, such as removing too much 
or not enough of the old component. 


2.2 Language Issues in Software Reuse 

The Ada programming language contains a number of constructs that encourage effective 
reuse, including packages and generics [Ich85, WCW85, GP87, EG90]. A package is used to 
group a collection of declarations, such as types, variables, procedures and functions. The 
package construct allows for the encapsulation of related entities, encouraging the creation 
of well-defined abstractions such as encapsulated data types. For example, a stack package 
of a particular type can be created, containing the element type and operations such as push 
and pop. Through a simple modification of the element type, the package can be adapted 
to support operation on a different type. This would enable one to move toward the second 
type of reuse, tailoring the component slightly to suit the new application. 

Ada’s generic construct provides more support for verbatim reuse, as it enables the 
creation of more abstract entities. A generic program unit is a template for a module. 
Instantiation of the generic program unit yields a module. The generic units may be param- 
eterized, i.e., they may require the user to supply types or operations to create a module. 
This provides a great deal of flexibility in their use. For example, one may parameterize the 
stack package such that the user must supply the element type to create an instance of the 
stack. The generic stack can then be used without modification in support of a number of 
different types. 

High levels of reuse may be achieved in languages without such features, however, the 
approach taken to achieve such reuse will be different. Such differences were reported in a 
study comparing FORTRAN and Ada reuse in the NASA/SEL [BWS93]. The Ada approach 
was to develop a set of generics that can be instantiated to support a variety of application 
types. In contrast, the FORTRAN approach was to develop a collection of libraries specific to 
each application type. On projects within a very narrow domain, both approaches achieved 
similar high levels of reuse. However, when there was a significant change in the domain, 
the Ada approach achieved a sizable amount of reuse (50 percent verbatim reuse), while 
the FORTRAN approach showed less than 10 percent verbatim reuse [BWS93]. Thus it 
would appear that the parameterized, generic approach is better suited to development in a 
dynamic, evolving domain. 

While improved language features may help to enable reuse, they alone have not resulted 
in large-scale reuse in software development. There are other important factors involved- 
applications must be structured to allow and encourage reuse, and software organizations 
must be tailored to support a reuse-oriented development paradigm. 
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Figure 1: Interaction of a Project Organization with the Component Factory 


2.3 Organizational Support for Reuse 

One model that integrates reuse into a development is the “component factory” organi- 
zation, which is a dual-organization structure consisting of two parts: a factory organization 
and a project organization. The factory organization provides software components in re- 
sponse to requests from the various projects being developed in the project organization 
[BCC92]. Figure 1 illustrates the component factory concept in support of a project orga- 
nization. In this setting, the development organization makes requests to the component 
factory to provide components to be integrated into the desired product. If the component 
factory is effective, the activity of component creation can be significantly reduced, and 
the quality of the components that are delivered to the integration team can be increased, 
reducing the costs of development and of rework. The key features of the component fac- 
tory are the repository of the components for future reuse, and the focus on flexibility and 
continuous improvement. Thus a measurement-oriented approach must be utilized, such 
as that proposed in the TAME project [BR88], which provides an experimental view of 
software development, allowing for analysis and lea rning about the effectiveness of the new 
technologies. 

Reuse-oriented development will require some effort to be expended in activities that 
are not a part of traditional software development. For example, although the component 
factory will allow the effort spent in component creation to be reduced, it will also require 
additional activity in searching for and selecting the appropriate component for the particular 
application. These new activities may also be a potential source of errors in the system, and 
thus a source of rework effort. Introducing an activity of selecting a component from a 
repository may introduce new types of errors, for example, selecting a component that does 
not provide the intended function. 
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3 Using Error Analysis to Optimize the Development 
Process 


The Quality Improvement Paradigm provides a framework to build a continually im- 
proving organization relative to its evolving set of goals [Bas85, BR88]. The QIP consists of 
six steps: 

1. Characterize the current project and environment. 

2. Set Goals for project performance and improvement. 

3. Choose processes, as well as models and metrics, appropriate for the project. 

4. Execute the processes, and collect the prescribed data, and provide real-time feedback 
for corrective action. 

5. Analyze the data to evaluate current practices and make recommendations for future 
improvement. 

6. Package the experience in a form suitable for reuse on future projects. 

The first two steps deal with dete rmining the nature of the project, including goals for 
performance and improvement. Based on the characterization and goals, the third step se- 
lects the most suitable processes for the project; establishes the measurement plan, including 
choosing appropriate models and metrics, and sets up the mechanism for real-time feedback 
as the project progresses. The fourth step starts the selected processes, collects and the data 
as prescribed by the measurement plan, and uses the selected models and metrics to provide 
feedback to the development organization. The fifth and sixth steps occur off-line, as the 
data is analyzed and packaged into the experience base for use in other projects. 

Examining the various dimensions of errors in an organization can yield important 
lessons learned that may be used to improve software development. The goal of error anal- 
ysis is to learn about the nature of errors in the current enviro nme nt so that improvement 
can be made (e.g., process tailoring) in subsequent projects, and feedback can be provided 
to the current project. Thus error analysis can be associated with either of the two feedback 
loops in the model, the project loop, occurring in step 4, in which the results are in real-time 
provided back to the project, or the corporate loop, in steps 5 and 6, in which results are 
made available for subsequent projects in the organization. Our focus in this paper is on the 
corporate loop; i.e., the analysis and packaging steps for subsequent development, from the 
perspective of reuse-oriented software development. 

A number of recent studies have shown that product metrics can be used to determine 
the areas in a program that axe at a greater risk of containing a fault [AE92, SP88, BBH93, 
BTH93, MK92]. These studies indicate that models can be developed to isolate faulty 
components in a system based on characteristics of the components and their environment. 
Our goal is to develop an understanding of the differences between traditional development 
methods and reuse-oriented methods in terms of the characteristics of their errors. Increased 
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knowledge about the types of errors in an environment can be used to optimize the process 
for that environment. 

Basili and Selby found that the effectiveness of error detection techniques varies with 
the type of fault encountered [BS87]. For example, code reading was found to be the most 
effective technique for isolating interface errors, while functional testing was found to be 
more effective at finding logic errors. As such, a-priori knowledge of the distribution of 
the type of errors allows one to select verification techniques most appropriate for the that 
distribution. Suppose two thirds of the errors are interface errors, and one third logic errors. 
In this case, we would want to be sure to use techniques that are effective in finding interface 
errors. Given a limited budget for verification and validation, we may choose to expend more 
resources in code reading and fewer in functional testing. On the other hand, if a different 
project is much more likely to have logic errors than interface errors, it may be more effective 
to focus the verification activities on structural testing. 

Knowledge of when the errors are being introduced enables one to apply verification 
techniques at the most suitable time. If a large number of errors are being introduced in the 
design phase, adding design inspections to the development process may reduce the number 
of errors impacting later phases. On the other hand, if most errors are being introduced 
during coding, design inspections may not be as cost-effective. In this case, one may choose 
not to inspect design, but choose to have additional verification effort in the coding phase. 

The QEP can be used to take advantage of such knowledge. To incorporate this reuse 
information into the development process, we can develop a mapping to the QIP. The first 
step of the QIP, characterize the project, can be tailored to include determining the amount 
and type of reuse expected on the project. The second step, select appropriate models, can 
include selecting models of expected error profiles based on the characterization of reuse. 
The third step is to select the appropriate processes. Here, one can choose the processes 
expected to be most effective for the expected error distribution. The fourth and fifth steps 
are to execute the processes, collect data, and feedback the results. This can be seen as 
measuring the actual reuse profile, and measuring the effectiveness of the error mitigation 
strategies, and making a determination of whether to modify the selected processes based on 
the new information. For example, if the actual reuse profile is very different from original 
expectations, one should attempt to understand the factors that led to the difference, and, 
if appropriate, develop a new projection of the expected error profile. 


4 Description of the Analysis 


Since its origin, The NASA / GSFC SEL has collected a wealth of data from their software 
development [SEL94]. Selby performed a study on the characteristics of reused components 
on a collection of FORTRAN projects from this environment [Sel88], in which the level of 
reuse averaged 32 percent. Because of the support for reuse provided by the Ada language, as 
discussed in section 2.2, we chose to analyze the Ada projects in this environment. A much 
higher level of reuse than what was reported in [Sel88] has been achieved more recently in 
this environment [Kes90]. The high levels of reuse have been attributed in part to the Ada 
language constructs and object-oriented methods [Kes90, Sta93, BWS93]. More recently, 
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Project 

ID 

KSTMT 

Pet. Total 
Reuse 

Pet. Verbatim 
Reuse 

Effort 

(SM) 

A 

27.1 

31 

4 

175 

B 

14.4 

31 

13 

85 

C 

13.7 

38 

19 

72 

D 

24.8 

85 

27 

117 

E 

13.8 

97 

88 

30 

F 

12.8 

78 

44 

73 

G 

13.7 

100 

89 

16 


Table 1: Overview of the Examined Projects 


however, even the FORTRAN systems have been showing high levels of reuse, although the 
nature of the reuse is different than reuse in the Ada development environment. 

We analyzed a collection of seven medium-scale Ada projects from a narrow domain, as 
all are simulators which were developed at the NASA/GSFC Flight Dynamics Division. An 
overview of the projects examined is provided in Table 1. The projects ranged in size from 
61 to 184 thousand source lines, or 12.8 to 27.1 thousand Ada statements (KSTMT). They 
required development effort of 16 to 175 technical staff months. Reuse ranged from 4 to 89 
percent (verbatim), and from 31 to 100 percent (verbatim and with modification). 

While this environment is not organized along the lines of the Component Factory dis- 
cussed in section 2, it does have some characteristics in c omm on with that organization. In 
the SEL, generalized architectures were developed explicitly to facilitate large scale reuse 
from project to project [Sta93], so it is clear that significant effort has been applied towards 
the goal of reuse in the organization. As such, new systems have been developed in accor- 
dance with the packaged experience of reusable architectures, designs and code. One aspect 
of the Component Factory organization is the separate organization that produces or re- 
leases all reusable software products [BCC92]. While this feature is not present in the SEL, 
it is apparent that less effort is being spent on project-specific development activities. The 
percentage of effort spent in the Coding/Unit Test phase has dropped from 44 percent on an 1 
early simulator, to only 18 percent on one of the more recent simulators [Sta93]. This sug- 
gests that there is a significant leveraging of the stored experience, and as such, the observed 
effort on the SEL projects is becoming more in line with the profile one would expect in the 
Component Factory’s project organization, i.e., dominated by design and testing activities. 

We developed a set of questions with which to compare newly created, modified, and 
reused verbatim components: 

1. What is the impact of reuse on error density? 

2. Are errors in reused units easier to isolate or correct? 

3. Are the errors typically bang introduced at different phases? 

4. Are errors associated with reused units detected earlier in the lifecycle? 
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Component 

Origin 

No. 

Comp. 

KSTMT 

Pet. 

KSTMT 

New 

1095 

44.2 

36.5 

Extensively Modified 

152 

8.8 

7.2 

Slightly Modified 

517 

21.6 

17.8 

Reused Verbatim 

1495 

46.6 

38.5 

All Components 

3259 

121.2 

100.0 


Table 2: Profile of each class of component origin 


5. Are there different kinds of errors associated with reused units? 

6. Are there structural differences between new and reused units? 

Several types of data were used in our analyses. The first type of data has to do 
with the origin of a component — whether it was newly created or reused. At the time 
of component creation a form was filled out by the developer indicating the origin of the 
component-whether it was to be created new, reused from another component with extensive 
modification (more than 25 percent changed), reused with slight modification (less than 25 
percent changed), or reused verbatim (without change). Table 2 provides a summary of the 
number of components and source statements in each category of component origin. A larger 
amount of source code was created in the new and reused verbatim categories than in either 
of the categories of reuse with modification. 

The SEL uses “Change Report Forms” to collect data on changes to components for 
various reasons, such as error corrections, requirements changes, and planned enhancements. 
In this analysis, we examined the changes made to correct errors. For each reported error, the 
form identifies the modules that needed to be changed, the source of the error, (requirements, 
functional specification, design, code, or previous change), the type of the error (initialization, 
computational, data value, logic, internal interface, or external interface), and whether or 
not the error was one of omission (somet hing was not done) or commission (something was 
done incorrectly). 

Finally, we analyzed the systems with a source code static analysis tool, ASAP [Dou87], 
which provided us with a static profile of each compilation unit, including, for example, basic 
complexity measures such as McCabe’s Cyclomatic Complexity and Halstead’s Software 
Science, as well as counts of various types of declarations and statement usage. ASAP 
also identifies all with statements, so we were able to develop measures of the external 
declarations visible to each unit. 


5 Results of the Analysis 


This section presents the major findings from our analysis. We used non-parametric 
statistical methods to test the hypotheses there were significant differences among the classes 
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Component 

Ave. No. 

Ave. No. 

Ave. No. 

Origin 

Statements 

Parameters 

Withs 

New 

45.8 

2.1 

3.5 

Extensively Modified 

59.9 

2.1 

7.5 

Slightly Modified 

41.6 

1.9 

4.0 

Reused Verbatim 

24.5 

2.8 

1.1 

All Components 

36.8 

2.3 

2.7 


Table 3: Structural Characteristics of Subprogram Bodies 


of component origin in terms of tbe tbe nature and impact of the errors in each class. 
Structural characteristics of the components are discussed in 5.1, and the remaining sections 
describe findings associated with with the various dimensions of errors. 

5.1 Structural Characteristics 

Table 3 shows a collection of measures that characterize the structure of compilation 
units by class of reuse. Only compilation units that are subprogram bodies were considered, 
so as not to bias the results with characteristics of instantiations or package specifications. 
The average number of Ada statements provides an indication of the typical size of a compo- 
nent. The number of parameters is a rough measure of the generality of a component. The 
number of context couples (i.e., the number of “with” statements) provides an indication of 
the external dependencies of a particular unit. 

What we see is that the reused verbatim components are simpler in terms of their size and 
external dependencies, as evidenced by the number of source statements and with statements. 
The reused verbatim units average 24.5 statements and 1.1 withs per unit, while the new units 
average 45.8 statements and 3.4 withs per unit. The extensively modified units tend to be the 
most complex, as they average 59.9 statements and 7.5 withs per unit. The slightly modified 
units tend to be slightly smaller than the new units, but with roughly the same number of 
external dependencies. It is interesting to note that the extensively modified components 
are the most complex, both in terms of their size and external complexity. These results are 
similar to what was reported by Selby in his analysis of reuse in a collection of FORTRAN 
systems-the reused components tend to be simpler than newly created components in terms 
of size and interaction with other modules [Sel88]. This additional complexity may result 
in an increase in difficulty associated with these components in terms or their error density 
and error correction effort. 

We did note one result that is in contrast to Selby’s study. He reported that the verbatim 
reused modules tend to have a smaller interface than newly created units. We observed the 
opposite-that the verbatim reused modules tend to have more parameters than either the 
modified or new components. The verbatim reused components averaged 2.8 parameters per 
unit, versus 1.9 to 2.1 in the new and modified components. This difference is significant at 
the 0.01 level (i.e., there is less than a one percent chance that there actually is no difference 


4-64 


SEL-95-003 




Project 

Ave. No. 
Statements 

Ave. No. 
Withs 

Ave. No. 
Paxams. 

A 

15 

0.3 

1.9 

B 

14 

0.2 

1.8 

C 

14 

0.2 

1.8 

D 

18 

0.9 

2.7 

E 

31 

1.1 

3.0 

F 

26 

1.2 

2.1 

G 

26 

1.5 

3.1 


Table 4: Structural Characteristics in Verbatim Reused Components as Reuse Increases 


between the classes). Units that are more highly parameterized have an increased generality 
that may allow them to be more readily integrated into new applications. As such, we should 
expect to see a greater number of parameters in the unchanged modules. This difference 
may be indicative of the approach being taken to reuse in the environment. As previously 
noted, the Ada approach in this environment was based on the use of well-parameterized 
generics, while the FORTRAN approach was based on libraries of more specialized functions 
[BWS93]. As such, we might expect a lower level of parameterization in reused FORTRAN 
modules. Another reason for the difference from Selby’s study may be that his measure of a 
module’s interface is a sum of counts of the parameters and global references in the module. 
In the FORTRAN modules that he examined, this sum is likely to be dominated by the 
count of global references; as such, the variation in the count of subprogram parameters 
among the classes of reuse can not be observed. 

Table 4 shows the profile of the reused components over time, as the projects are listed 
in chronological order of their development start date. We see an increasing complexity (ex- 
pressed both in terms of module size and external dependencies) in the reused components. 
Also, we see a rise in the number of parameters per subprogram in the verbatim units, sug- 
gesting an increasing generality among them. Low level utility functions were the first to 
be reused, but as the organization gained reuse experience, more and more complex units 
were reused as well. Thus while utility functions may be among the best components to 
initially stock a repository, a reuse process is not limited to them. As an organization gains 
experience, more and more complex units, at higher levels of the application hierarchy may 
be reused. 


5.2 Error Density 

Table 5 shows the error and defect densities (errors/defect per thousand source state- 
ments) observed in each of the four classes of component origin. We use error to refer to 
a change report in which the reason for the change was attributed to an error correction. 
A change report can list several components as requiring correction due to a single error. 
We refer each instance of a component requiring modification due to an error as a defect 
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Component 

Origin 

No. 

Comp. 

KSTMT 

Defect 

Density 

Error 

Density 

S/A Err. 
Density 

New 

1095 

44.2 

24.8 

13.0 

8.4 

Extensively Modified 

152 

8.8 

19.5 

14.0 

8.9 

Slightly Modified 

517 

21.6 

10.5 

7.4 

2.5 

Reused Verbatim 

1495 

46.6 

2.1 

1.2 

0.7 

All Components 

3259 

121.2 

13.1 

7.6 

4.4 


Table 5: Error densities in each class of component origin 


As such, there can be several defects associated with a single error. Two measures of error 
density are shown-the first includes all errors from unit test through acceptance test, while 
the second only includes those detected in system and acceptance test. The first measure 
can provide an indication of the total amount of rework, while the second shows the amount 
that is occurring late in the development life-cycle. The measure of defect density shown in 
the table includes defects from unit through acceptance test. 

We used a non-parametric test to obtain a statistical comparison of component error 
density by class of component origin. This comparison shows a significantly lower error den- 
sity among the reused verbatim components compared to each of the other classes. Similarly, 
there is a significant difference between the slightly modified components, and the new and 
extensively modified components. No significant difference was observed between new and 
extensively modified components. 

In terms of error density, reuse via extensive modification appears to yield no advan- 
tage over new code development. There is a benefit from reuse in terms of reduced error 
density when the reuse is verbatim or via slight modification. However, reuse through slight 
modification only shows about a 50 percent reduction in total error density, while verbatim 
reuse results in more than a 90 percent reduction. When we only look at the errors that 
are encountered during the system and acceptance test phases, we still see a greater than 
90 percent reduction in defect density in the reused verbatim class (0.7 errors per KSLOC, 
compared to 8.4 errors per KSLOC in the new components). The slightly modified com- 
ponents, with 2.5 errors per KSLOC, show a reduction of nearly 70 percent compared to 
the new components, with 8.4 errors per KSLOC. Verbatim reuse clearly provides the most 
significant benefit to the development process in terms of reducing error density, but reuse 
via slight modification also provides a substantial improvement, one which is even more 
noticeable in the test phases. 

A number of studies have found higher defect /error densities in smaller components than 
in larger components [BP84, SYTP85, LV89, MP93]. As shown in table 6, our data supports 
their findings. Small components (25 or less statements) have defect density more than 
twice that of the larger components (more than 25 statements), and this difference is highly 
significant. The only class of reuse where we saw no significant difference was the reused 
verbatim components, as they have the same defect density regardless of size. The defect 
density in the small components was more than twice that of the larger components in the 
new and extensively modified classes, and nearly four times greater in the slightly modified 
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Component 

Origin 

Small 

Large 

No. Comp. 

Def. Dens. 

No. Comp. 

Def. Dens. 

• New 

638 

49.8 

~ 457 

19.8 

Extensively Modified 

67 

35.7 

85 

17.7 

Slightly Modified 

283 

26.5 

234 

7.4 

Reused Verbatim 

952 

2.3 

543 

2.0 

All Components 

1940 

22.6 

1319 

10.9 


Table 6: Relationship of defect density and component size 


class. One explanation for higher error density in the small components is that a system 
composed of small components will have more interfaces than a system composed of large 
components; and interfaces axe frequently noted as a major source of error in development. 

5.3 Error Isolation/Completion Difficulty 

Basili and Perricone, in their study of a FORTRAN development project, reported 
that modified components typically required more correction effort than new components 
[BP84]. We see a similar result in the two classes of modified components, and also see the 
same pattern occurring in the reused verbatim components. Table 7 shows the percentage 
of errors in each class of reuse that were categorized as difficult to isolate or difficult to 
complete (defined as more than one day to isolate or complete, resp.), and the relative 
rework effort, a crude approximation of relative effort (staff-hours per KSTMT) in isolating 
and correcting these errors. In terms of effort to isolate, we see little difference among 
the classes of component origin. Newly created components had the small est percentage 
of difficult-to-isolate errors, but it was not significantly different from any of the classes of 
reused components. This result is not surprising, as the isolation activity is associated more 
with understanding the intended functions rather than with their implementation. As such, 
the origin of the components may not have as great an impact on isolation effort as it will 
have on completion effort. 

We do see an increase in the effort to complete an error in reused components relative 
to new components. The new components had the lowest percentage of errors requiring 
more than 1 day to complete a change and the reused verbatim components had the highest 
percentage, while the modified components fell in between. The difference between the new 
and the reused verbatim components is significant at the 0.05 level. One explanation for 
this effect is that the developers have a greater familiarity with the newly created compo- 
nents, so less time is needed to understand the components that must be changed. Another 
explanation is that the majority of the “easy” errors had previously been removed from the 
reused component, leaving only the more diffi cult ones. 

To determine whether the increased error correction cost in the reused components 
outweighs benefit of their having fewer errors, we computed a rough measure of the amount 
of error rework expended in each class. Unfortunately, our data for effort spent in error 
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Component 

Origin 

KSTMT 

No. 

Errors. 

Pet. Diff. 
Isolation 

Pet. Diff. 
Completion 

Rd. Rework 
Effort 

New 

44.2 

574 

12.4 

10.1 

118.3 

Extensively Modified 

8.8 

124 

14.5 

17.7 

157.4 

Slightly Modified 

21.6 

160 

13.8 

13.1 

76.8 

Reused Verbatim 

46.6 

58 

14.3 

22.4 

14.7 

All Components 

121.2 

916 

13.2 

12.6 

73.9 


Table 7: Difficulty in error isolation/correction 


correction and isolation is categorical, so we approximated the true effort simply by the 
midpoint of the category (fi). Rework was then computed as the sum of this approximation 
over all errors. Our relative rework measure (RR) was computed by dividing rework by the 
number of statements (S), i.e.: 


RR = 


s 


Again, we used a non-par ametric test to determine whether there is a significant dif- 
ference in the relative rework effort among the four classes of component origin. The tests 
found a significant difference among the classes with one exception. When comparing the 
extensively modified components and the new components we found the level of significance 
to be only 0.18. There may be an increase in the rework cost of extensively modified com- 
ponents, however, our data does not confirm this. In any event, it is not dear whether such 
an increase in rework cost would be offset by the expected benefit of reduced component 
creation cost. 

For all other pairs, the result was significant at the 0.01 level. Reuse via slight modi- 
fication shows a 35 percent reduction in rework cost over newly created components, while 
verbatim reuse provides an 88 percent reduction. For these modes of reuse, the benefit of 
fewer errors clearly outweighs the cost of more difficult error correction. This measure of 
benefit is somewhat conservative, as it does not account for the expected reduction in com- 
ponent creation cost, or for the impact of errors as “obstades” in the devdopment process 
(e.g., the cost of ddays due to effort spent correcting errors). As such, we expect these modes 
of reuse to yield an even greater improvement over new devdopment. This shows that there 
is a shift in costs of reuse compared to traditional development, with the reuse-oriented 
development showing less development effort and fewer, but more costly, errors. 


5.4 Source of Errors 


Understanding the activity in which the error is introduced allows for corrective action 
to be applied at the appropriate time. Table 8 shows, for each class of component origin, 
the percentage of errors from each error source (when the error was introduced). Across all 
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Component 

Origin 

Rqmts. or 
Fun. Spec. 

Design 

Code 

Previous 

Change 

Any 

Error 

New 

7.3 

16.8 

68.1 

7.8 

100 

Extensively Modified 

5.6 

20.2 

59.7 

14.5 

100 

Slightly Modified 

4.4 

26.9 

60.1 

10.6 

100 

Reused Verbatim 

3.4 

3.4 

74.1 

19.0 

100 

All Components 

5.7 

18.2 

66.1 

10.0 

100 


Table 8: Percentage of errors in each class of error source by class of reuse 


classes, coding errors are the most common error; however, errors associated with require- 
ments, functional specification and design occur at a slightly higher rate in new components 
than in reused components. The Basili-Perricone study reported the opposite effect of reuse 
on the specification errors [BP84]. They found that modified modules had a higher propor- 
tion of specification errors than did the new modules, and explained the result by suggesting 
that the specification was not well-enough or appropriately defined to be used in different 
contexts. A similar result was reported by Endres [End75]. A difference from the environ- 
ments examined in those studies is that reuse has been well planned for in this environment. 
The organization is not structured as a pure “component factory” as described in section 3, 
but it is moving in that direction. As such, the architecture, design and specifications have 
improved in this environment to better allow and encourage reuse. This result suggests that 
the reused functionality is more likely to be well specified. This is not surprising, since the 
reused components have been specified previously, with the expectation that they would be 
reused. As such, any specification errors axe more likely to affect new components rather 
than reused components. The result also indicates that reuse, whether formal or informal, 
is occurring in this environment at a higher level than simply code. 

A second item of interest is the increased percentage of design errors in the modified 
components. This suggests that there is increased difficulty in designing an adaptation of 
an existing component to a new role. This is more difficult because the reuser must be 
concerned with two pieces of information: the intended function and the existing function. 
In creating a new component, one only needs to be concerned with the intended function. 
A misunderstanding of the existing function can result in an error, and that error is likely 
to be attributed to the design. 

5.5 Time of Error Detection 

Errors detected late in the development life-cycle can have a much greater cost than 
those detected early. Table 9 shows, by class of component origin, the percentage of all errors 
and the more difficult errors that escape unit test. Across all errors, we see little difference 
between the classes of new, extensively modified, and reused verbatim components, as nearly 
two thirds of the errors in these classes escaped unit test. This is significantly higher than 
what we observed in the slightly modified components, where only 43 percent escaped unit 
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Component 

Origin 

Pet. All 
Errors. 

Pet. Diff. 
Isolation 

Pet. Diff. 
Completion 

New 

69 

86 

80 

Extensively Modified 

66 

81 

87 

Slightly Modified 

43 

74 

58 

Reused Verbatim 

62 

100 

100 

All Components 

64 

84 

78 


Table 9: Percentage of errors that escape unit test 


Component 

Origin 

Error of 
Omission 

Both 

Error of 
Comission 

Any 

New 

35.4 

28.6 

36.0 

100 

Extensively Modified 

40.3 

29.4 

30.3 

100 

Slightly Modified 

39.6 

20.8 

39.6 

100 

Reused Verbatim 

26.3 

26.3 

47.3 

100 

All Components 

36.2 

27.2 

36.6 

100 


Table 10: Percentage of errors of omission and commission 


test. 

Of the difficult isolation errors (those talcing more than one day to isolate), there is not 
much difference among the classes-a relative high percentage of these errors escape in all 
classes. However, again, the slightly modified components do show the lowest percentage. 
There is a significant reduction in the slightly modified class in the percentage of difficult- 
to-complete errors that escape unit test, as only 58 percent of these errors escape unit test, 
compared to 80 to 100 percent in the other classes. This suggests that the verification process 
is more effective in eliminating the difficult errors for the slightly modified components than 
for other modes of component creation. 


5.6 Nature of the Errors 


Table 10 shows the percentage of errors that were classified as one of omission, com- 
mission, or both. An error associated with a component that was reused verbatim is more 
likely to be error of commission, and less likely to be one of omission. This suggests that the 
reused component was typically complete, i.e., it contained the necessary functionality, but 
at times was in error. 

Extensively modified components are more likely to have errors of omission than errors 
of commission. This may be an indication of the greater complexity of these components. 
Another possible explanation is that in the development of these components, the intended 
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Component 

Origin 

Procedural 

Interface 

Data 

All 

New 

41.2 

14.1 

44.6 

100 

Extensively Modified 

47.6 

17.7 

34.7 

100 

Slightly Modified 

31.8 

31.2 

36.9 

100 

Reused Verbatim 

48.2 

12.1 

39.7 

100 

All Components 

40.9 

17.5 

41.6 

100 


Table 11: Percent of errors of eacb type by class of component origin 


function was not so clear, resulting in necessary parts being omitted. Additional review 
of the completeness of the design of these components may be a means for removing these 
errors at an earlier stage. 

New and extensively modified components have a higher rate of errors that are classified 
as both omission and commission than do the slightly modified or reused verbatim compo- 
nents. This may be due to the nature of new development-it is more likely to result in a 
complex error. 


5.7 Type of Errors 

Table 11 shows the percentage of errors that were classified in each of the three classes: 
procedural, interface, and data. Procedural errors are those that were classified as either 
a computational or a logic error, interface errors are those that were classified as either an 
internal or external interface error, and data errors are those that were classified as either 
an initialization or a data value error. 

We see a significant difference in the distribution of error types in the slightly modified 
components, as they have a much higher frequency of interface errors than any other class. 
This suggests that the nature of the modifications is likely to be associated with the interface. 
We also see that the new components are more likely to have data errors than the reused 
components. Basili and Perricone found the opposite effect, namely, that the modified 
components had a greater percentage of data errors than did the new components. These 
results suggest that a different approach has been taken toward reuse. In the FORTRAN 
project studied by Basili and Perricone, the approach may have been to tailor data values 
and initialization to adapt the component to the new application. The approach taken in 
the Ada environment is to create generalized modules that can be parameterized to create 
instances suitable for the new application. As such, one might expect fewer data errors in 
reused components in the Ada environment. 
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6 Conclusions 


In this analysis we observed dear benefits from reuse-for example, reduced error density. 
We found that verbatim reuse provides a substantial improvement in error density (more 
than a 90 percent reduction) compared to new development. The other modes of reuse did 
not approach this level of improvement. Reuse via slight modification offered a 50 percent 
reduction in error density compared to new development, but the improvement with this 
mode of reuse was greater in errors detected late in development (a 70 percent reduction). 

We observed a shift in costs of reuse-oriented development, with the reuse offering fewer, 
but more difficult errors. The effect of increased difficulty in error correction was apparent 
across the three modes of reuse, although it was less evident in the slightly modified com- 
ponents. In both the verbatim and slightly modified classes of reuse, the relative amount 
of rework was less than in new code. This suggests that while there is a cost of increased 
correction effort per error associated with such reuse, the cost is outweighed by the benefit 
of the reduced number of errors. Coupled with the reduction in development effort, these 
modes of reuse appear to offer a substantial benefit to development. 

Reuse via extensive modification does not provide the reduction in error density that 
the other modes of reuse yield, and it also results in errors that typically were more difficult 
to isolate and correct than the errors in newly developed code. In terms of the rework due 
to the errors in these components, it appears that this mode of development is more costly 
than new development. However, extensive modification may offer savings in development 
effort that outweigh the increased cost of rework. This remains an issue for further study. 

A different profile of errors was observed for different modes of reuse. For example, a 
greater percentage of design errors were observed in the modified components. The observed 
increase in design errors may be due to errors in the additional activities of understanding the 
function and implementation of the component to be modified, as well as due to the fact that 
less code was being written. Such information can be used to help in selecting appropriate 
verification methods for projects where there is significant reuse via modification. One may 
want to increase the effort in design reviews on such projects, while on projects dominated 
by new development, code reviews may receive more emphasis. This finding also suggests 
that one might want to investigate techniques to better describe the components stored 
in the experience base so that the likelihood of a misunderstan ding of the function and 
implementation is lessened. 

The experience with reuse in an organization and the approach taken toward reuse axe 
likely to influence the nature of errors observed in the organization. In this study of an 
organization well experienced with reuse, we observe a number of effects that differed with 
findings from other studies of environments where reuse was not pl ann ed for to such an 
extent. The reused components appear to be simpler, have fewer dependencies, and be more 
parameterized than new components. However, as this organization gained reuse experience, 
the distinction became less apparent-more and more complex components, at higher levels 
in the application hierarchy were reused. As an organization moves toward a reuse-oriented 
development approach, it must evolve its practices to af. mrnTnn da.te the new effects of reuse. 
In the context of the QIP, error analysis can be a useful mechanism to provide insight into 
the benefits and difficulties of reuse in software development. 
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Abstract 

This paper presents the results of a study conducted at the University of Maryland in which we 
experimentally investigated the suite of Object-Oriented (00) design metrics introduced by 
[Chidamber&Kemerer, 1994 J. In order to do this, we assessed these metrics as predictors of 
fault-prone classes. This study is complementary to [Lie&Henry, 1993] where the same suite of 
metrics had been used to assess frequencies of maintenance changes to classes. To perform our 
validation accurately, we collected data on the development of eight medium-sized information 
management systems based on identical requirements. All eight projects were developed using a 
sequential life cycle model, a well-known 00 analysis/design method and the C++ programming 
language. Based on experimental results, the advantages and drawbacks of these 00 metrics are 
discussed and suggestions for improvement are provided. Several of Chidamber&Kemerer’ s 00 
metrics appear to be adequate to predict class fault-proneness during the early phases of the life- 
cycle. We also showed that they are, on our data set, better predictors than “traditional” code 
metrics, which can only be collected at a later phase of the software development processes. 

Key-words: Object-Oriented Design Metrics; Error Prediction Model; Object-Oriented Software 
Development; C++ Programming Language. 
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1 . 


Introduction 


1 . 1 Motivation 

The development of a large software system is a time- and resource-consuming activity. Even with 
the increasing automation of software development activities, resources are still scarce. Therefore, 
we need to be able to provide accurate information and guidelines to managers to help them make 
decisions, plan and schedule activities, and allocate resources for the different software activities 
that take place during software evolution. Software metrics are thus necessary to identify where the 
resource issues are; they are a crucial source of information for decision-making [Harrison, 1994]. 

Testing of large systems is an example of a resource- and time-consuming activity. Applying equal 
testing and verification effort to all parts of a software system has become cost-prohibitive. 
Therefore, one needs to be able to identify fault-prone classes so that testing/verification effort can 
be concentrated on these classes [Harrison, 1988]. The availability of adequate product design 
metrics for characterizing eiror-prone classes is thus vital. 

Dozens of product metrics have been proposed [Fenton, 1991], used, and, sometimes, 
experimentally validated in academia [Basili&Hutchens, 1982] and industry, e.g., number of lines 
of code, MacCabe complexity metric, etc. In fact, many companies have built their own cost, 
quality and resource prediction models based on product metrics. TRW [Boehm, 1981], the 
Software Engineering Laboratory (SEL) [McGarry et. al . , 1994] and Hewlett Packard [Grady, 
1994] are examples of software organizations that have been using product metrics to build their 
cost, resource, defect, and productivity models. 

1.2 Issues 

In the last decade, many companies have started to introduce Object-Oriented (00) technology into 
their software development environments. 00 analysis/design methods, OO languages, and 00 
development environments are currently popular worldwide in both small and large software 
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organizations. The insertion of OO technology in the software industry, however, has created new 
challenges for companies which use product metrics as a tool for monitoring, controlling and 
improving the way they develop and maintain software. Therefore, metrics which reflect the 
specificities of the OO paradigm must be defined and validated in order to be used in industry. 
Some studies have concluded that “traditional” product metrics are not sufficient for characterizing, 
assessing and predicting the quality of OO software systems. For example, based on a study at 
Texas Instruments, [Brooks, 1993] has reported that McCabe cyclomatic complexity appeared to 
be an inadequate metric for use in software development based on OO technology. 

To address this issue, OO metrics have recently been proposed in the literature [Abreu&Carapuqa, 
1994; Chidamber&Kemerer, 1994]. However, most of them have not undergone a thorough and 
comprehensive experimental validation. [Briand et.al., 1994] and [Lie&Henry, 1993] are rare 
exceptions in this respect. The work described in this paper is an additional step toward a thorough 
experimental validation of the OO metric suite defined in [Chidamber&Kemerer, 1994], This paper 
presents the results of a study conducted at the University of Maryland in which we performed an 
experimental validation of that suite of OO metrics with regard to their ability to identify fault- 
prone classes. Data were collected during the development of eight medium-sized management 
information systems based on identical requirements. All eight projects were developed using a 
sequential life cycle model, a well-known Object-Oriented analysis/design method [Rumbaugh et 
al, 1991], and the C++ programming language [Stroustrup, 1991]. In fact, we used an experiment 
framework that should be representative of currently used technology in industrial settings. This 
study discusses the strengths and weaknesses of the validated OO metrics with respect to 
predicting faults across classes. 

1.3. Outline 

This paper is organized as follows. Section 2 presents the suite of OO metrics proposed by 
Chidamber&Kemerer (1994), and the methodology we used for experimental validation. Section 3 
presents the data collected together with the statistical analysis of the data. Section 4 compares our 
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study with other works on the subject. Finally, section 5 concludes the paper by presenting lessons 
learned and future work. 

2. Description of the Study 

2.1. Experiment goal 

The goal of this study was to analyze experimentally the OO design metrics proposed in 
[Chidamber&Kemerer, 1994] for the purpose of evaluating whether or not these metrics are 
suitable for predicting the probability of detecting faulty classes. From [Chidamber&Kemerer, 
1994], [Chidamber&Kemerer, 1995] and [Churcher&Shepperd, 1995], it is clear that the 
definitions of these metrics are not language independent. As a consequence, we had to slightly 
adjust some of Chidamber&Kemerer’ s metrics in order to reflect the specificities of C++. These 
metrics are as follows: 

• Weighted Methods per Class (WMC). WMC measures the complexity of an individual class. 
Based on [Chidamber&Kemerer, 1994], if we consider all methods of a class to be equally 
complex, then WMC is simply the number of methods defined in each class. In this study, we 
adopted this approach for the sake of simplicity and because the choice of a complexity metric 
would be somewhat arbitrary since it is not fully specified in the metric suite. Thus, WMC is 
defined as being the number of all member functions and operators defined in each class. 
However, "friend" operators (C++ specific construct) are not counted. Member functions and 
operators inherited from the ancestors of a class are also not counted. This definition is 
identical the one described in [Chidamber&Kemerer, 1995], The assumption behind this metric 
is that a class with significantly more member functions than its peers is more complex, and by 
consequence tends to be more fault-prone. 

Churcher&Shepperd (1995) have argued that WMC can be measured in different ways 
depending on how member functions and operations defined in a C++ class are counted. We 
believe that the different counting rules proposed by [Churcher&Shepperd, 1995] correspond 
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to different metrics, similar to the WMC metric, and which must be experimentally validated as 
well. A validation of Churcher&Shepperd’s WMC-like metrics is, however, beyond the scope 
of this paper. 

Depth of Inheritance Tree of a class (DIT) - DIT is defined as the maximum depth of the 
inheritance graph of each class. C++ allows multiple inheritance and therefore classes can be 
organized into a directed acyclic graph instead of trees. DIT, in our case, measures the number 
of ancestors of a class. The assumption behind this metric is that well-designed OO systems are 
those structured as forests of classes, rather than as one veiy large inheritance lattice. In other 
words, a class located deeper in a class inheritance lattice is supposed to be more fault-prone 
because the class inherits a large number of definitions from its ancestors. 

Number Of Children of a Class (NOC) - This is the number of direct descendants for each 
class. Classes with large number of children are difficult to modify and usually require more 
testing because the class potentially affects all of its children. Thus, a class with numerous 
children has to provide services in a larger number of contexts and must be more flexible. We 
expect this to introduce more complexity into the class design. 

Coupling Between Object classes (CBO) - A class is coupled to another one if it uses its 
member functions and/or instance variables. CBO provides the number of classes to which a 
given class is coupled. The assumption behind this metric is that highly coupled classes are 
more fault-prone than weakly coupled classes. So coupling between classes should be 
identified in order to concentrate testing and/or inspections on such classes. 

Response For a Class (RFC) - This is the number of methods that can potentially be executed 
in response to a message received by an object of that class. In our study, RFC is the number 
of functions directly invoked by member functions or operators of a class. The assumption 
here is that the larger the response set of a class, the higher the complexity of the class, and the 
more fault-prone and difficult to modify. 
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• Lack of Cohesion on Methods (LCOM) - This is the number of pairs of member functions 
without shared instance variables, minus the number of pairs of member functions with shared 
instance variables. However, the metric is set to 0 whenever the above subtraction is negative. 
A class with low cohesion among its methods suggests an inappropriate design, (i.e., the 
encapsulation of unrelated program objects and member functions that should not be together), 
which is likely to be fault-prone. 

Readers acquainted with C++ can see that many particularities of C++ are not taken into account by 
Chidamber&Kemerer’s metrics, e.g., C++ templates, friend classes, etc. In fact, additional work 
is necessary in order to extend the proposed 00 metric set with metrics specifically tailored to 
C++. 


2 . 2 Experimental framework 

In order to experimentally validate the 00 metrics proposed in [Chidamber&Kemerer, 1994] with 
regard to their capabilities to predict fault probability, we ran a controlled study over four months 
(from September to December, 1994). The population under study was a graduate level class 
offered by the Department of Computer Science at the University of Maryland. The students were 
not required to have previous experience or training in the application domain or 00 methods. All 
students had some experience with C or C++ programming and relational databases and therefore 
had the basic skills necessary for such an experiment. 

The students were randomly grouped into 8 teams. Each team developed a medium-sized 
management information system that supports the rental/retum process of a hypothetical video 
rental business, and maintains customer and video databases. 

The development process was performed according to a sequential software engineering life-cycle 
model derived from the Waterfall model. This model includes the following phases: Analysis, 
Design, Implementation, Testing, and Repair. At the end of each phase, a document was delivered: 
Analysis document, design document, code, error report, and finally, modified code, respectively. 
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Requirement specifications and design documents were checked in order to verify that they 
matched the system requirements. Errors found in these first two phases were reported to the 
students. This maximized the chances that the implementation began with a correct OO 
analysis/design. The testing phase was accomplished by an independent group composed of 
experienced software professionals. This group tested all systems according to similar test plans 
and using functional testing techniques. During the repair phase, the students were asked to correct 
their system based on the errors found by the independent test group. 

OMT, an OO Analysis/Design method, was used during the analysis and design phases 
[Rumbaugh et. al., 1991]. The C++ programming language, the GNU software development 
environment, and OSF/MOTIF were used during the implementation. Sparc Sun stations were 
used as the implementation platform. Therefore, the development environment and technology we 
used are representative of what is currently used in industry and academia. 

The following libraries were provided to the students: 

a) Motif. App. This public domain library provides a set of C++ classes on top of OSF/MOTIF for 
manipulation of windows, dialogs, menus, etc. [Young, 1992]. The MotifApp library provides 
a way to use the OSF/Motif widgets in an OO programming/design style. 

b) GNU library. This public domain library is provided in the GNU C++ programming 
environment. It contains functions for manipulation of string, files, lists, etc. 

c) C++ database library. This library provides a C++ implementation of multi-indexed B-Trees. 

No special training was provided for the students in order to teach them how to use these libraries. 
However, a tutorial describing how to implement OSF/Motif applications was given to the 
students. In addition, a C++ programmer, familiar with OSF/Motif applications, was available to 
answer questions about the use of OSF/Motif widgets and the libraries. A hundred small programs 
exemplifying how to use OSF/Motif widgets were also provided. Finally, the code sources and the 
complete documentation of the libraries were made available. It is important to note that the 
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students were not required to use the libraries and, depending on the particular design they 
adopted, different reuse choices were expected. 

We also provided a specific domain application library in order to make our experiment more 
representative of the "real world". This library implemented the graphical user interface for 
insertion/removal of customers and was implemented in such a way that the main resources of the 
OSF/Motif widgets and MotifApp library were used. Therefore, this library contained a small part 
of the implementation required for the development of the rental system. 

2.3. Data Collection 

We collected: (1) the source code of the C++ programs delivered at the end of the implementation 
phase, (2) data about these programs, (3) data about errors found during the testing phase and 
fixes during the repair phase, and (4) the repaired source code of the C++ programs delivered at 
the end of the life cycle. GEN++ [Devanbu, 1992] was used to extract Chidamber&Kemerer’s 00 
design metrics directly from the source code of the programs delivered at the end of the 
implementation phase. To collect items (2) and (3) , we used the following forms, which have 
been tailored from those used by the Software Engineering Laboratory [Heller et. al, 1992]: 

• Defect Report Form. 

• Component Origination Form. 

In the following sections, we comment on the purpose of the Component Origination and Defect 
Report forms used in our experiment and the data they helped collect. 

2.3.1 Defect Report Form 

This form was used to gather data about (1) the defects found during the testing phase, (2) classes 
changed to correct such defects, and (3) the effort in correcting them. The latter includes: 
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• how long it took to determine precisely what change was needed. This includes the effort 
required for understanding the change or finding the cause of the error, locating where the 
change was to be made, and determining that all effects of the change were accounted for. 

• how much time it took to implement the correction. This includes design changes, code 
modification, regression testing, and updates to documentation. 

2.3.2 Component Origination Form 

This form is used to record information that characterizes each class under development in the 
project at the time it goes into configuration management. Firstly, this form is used to capture 
whether the class has been developed from scratch or has been developed from a reused class. In 
the latter case, we collected the amount of modification (none, small or large) that was needed to 
meet the system requirements and design as well as the name of the reused class. By small/large, 
we mean that less/more than 25% of the original code had been modified, respectively. However, 
this kind of data was difficult to obtain because we do not have appropriate tools to collect this data 
automatically. As a simplification, we asked the developers to tell us if more or less than 25% of a 
class had been changed. In the former case, the class was labeled: Extensively modified and in the 
latter case: Slightly modified. Classes reused without modification were labeled: verbatim reused. 

In addition, the name of the sub-system to which the class belonged was also collected. In our 
study, we had three types of sub-systems: graphical user interface (GUI), textual user interface 
(TUI), and database processing (DB). 

3. Analysis of Experimental Results 

In this section, we will attempt to assess experimentally whether the 00 design metrics defined in 
[Chidamber&Kemerer, 1994] are suitable predictors of fault-prone classes. This will help us 
assess these metrics as quality indicators and how they compare to common code metrics. Thus, 
we intend to provide the type of empirical validation that we think is necessary before any attempt 
to use such metrics as objective and early indicators of quality. Section 3.1 shows the descriptive 
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distributions of the 00 metrics in the studied sample whereas Section 3.2 provides the results of 
univariate and multivariate analyses of the relationships between 00 metrics and fault-proneness. 

3.1. Analysis of Distributions 

Figure 1 shows the distributions of the analyzed OO metrics based on 180 classes present in the 
studied systems. Table 1 provides common descriptive statistics of the metric distributions. These 
results indicate that inheritance hierarchies are somewhat flat (DIT) and that classes have, in 
general, few children (NOC). In addition, most classes show a lack of cohesion (LOOM) near 0. 
This latter metric does not seem to differentiate classes well and this stems from its definition 
which prevents any negative measure. This issue will be discussed further in Section 3.2. 



Figure 1 : Distribution of the analyzed 00 metrics 
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WMC 

DIT 

RFC 

NOC 

LCOM 

CBO 

maximum 

99.000 

9.0000 

105.00 

13.000 

426.00 

30.00 

minimum 

1 .0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.000 

median 

9.5000 

0.0000 

19.5000 

0.0000 

0.0000 

5.000 

Mean 

13.3897 

1.3179 

33.9141 

0.2308 

9.7077 

6.7962 

Std Dev 

14.9052 

1.9896 

33.3703 

1.5377 

63.7766 

7.5614 


Table 1: Descriptive statistics of the analyzed OO metrics. 


Descriptive statistics will be useful to help us interpret the results of the analysis in the remainder of 
this section. In addition, they will facilitate comparisons of results from future similar studies. 

3.2 The Relationships between Fault Probability and OO Metrics 

3.2.1 Analysis Methodology 

The response variable we use to validate the OO design metrics is binary, i.e., was a fault detected 
in a class during testing phases? We used logistic regression to analyze the relationship between 
metrics and the fault-proneness of classes. Logistic regression is a classification technique 
[Hosmer&Lemeshow, 1989] used in many experimental sciences based on maximum likelihood 
estimation. In this case, a careful outlier analysis must be performed in order to make sure that the 
observed trend is not the result of a few observations [Dillon&Goldstein, 1984], even though 
logistic regression is deemed to be more robust for outliers than least-square regression. 

In particular, we first used univariate logistic regression, to evaluate the relationship of each of the 
metrics in isolation and fault-proneness. Then, we performed multivariate logistic regression, to 
evaluate the predictive capability of those metrics that had been assessed sufficiently significant in 
the univariate analysis (e.g., a < 0.10 is a reasonable heuristic). This modeling process is further 
described in [Hosmer&Lemeshow, 1989]. 

A multivariate logistic regression model is based on the following relationship equation (the 
univariate logistic regression model is a special case of this, where only one variable appears): 
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) = Cq + C1X1 + C2X2 + ••• + c n x n 


( 1 ) 



where p is the probability that a fault will be found in a class during the validation phase, and the 
Xfs are the OO metrics included as predictors in the model (called covariates of the logistic 

regression equation). In the two extreme cases, i.e., when a variable is either non-significant or 
entirely differentiates fault-prone classes, the curve (between p and any single 2Q, i.e., assuming 
that all other Xj s are constant) approximates a horizontal line and a vertical line respectively. In 
between, the curve takes a flexible S shape. However, since p is unknown, the coefficients Q will 

be estimated through a likelihood function optimization [Hosmer&Lemeshow, 1989]. This 
procedure assumes that all observations are statistically independent. When building the regression 
equations, each observation was weighted according to the number of faults detected in each class. 
The rationale is that each detection of a fault is considered as an independent event: Classes where 
no faults were detected were weighted 1. 

Tables 2 and 3 contain the results we obtained through, respectively, univariate and multivariate 
logistic regression on all of the 180 classes. We report those related to the metrics that turned out to 
be the most significant across all eight development projects. For each metric, we provide the 
following statistics: 

• Coefficient (appearing in Tables 2 and 3), the estimated regression coefficient. The larger the 
coefficient in absolute value, the stronger the impact of the explanatory variable on the 
probability p of a fault to be detected in a class. 

• A\|/ (appearing in Table 2 only), which is based on the notion of odd ratio 
[Hosmer&Lemeshow, 1989], and provides an evaluation of the impact of the metric on the 
response variable. More specifically, the odds ratio \|/(X) represents the ratio between the 
probability of having a fault and the probability of not having a fault when the value of the 
metric is X. As an example, if, for a given value X, \if(X) is 2, then it is twice as likely that the 
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class does contain a fault than that it does not contain a fault. The value of A\|/ is computed by 
means of the following formula: 


¥(X) 


( 2 ) 


Therefore, A\|t represents the reduction/increase in the odd ratio when the value X increases by 
1 unit. This provides a more intuitive insight than regression coefficients into the impact of 
explanatory variables. 

• The level of significance (a, appearing in Tables 2 and 3) provides an insight into the accuracy 
of the coefficient estimates. It tells the reader about the probability of the coefficient being 
different from zero by chance. Usually, a level of significance of a = 0.05 (i.e., 5% 
probability) is used as a threshold to determine whether an explanatory variable is a significant 
predictor. However, the choice of a particular level of significance is ultimately a subjective 
decision and other levels such as a= 0.01 or 0.1 are common. Also, the larger the level of 
significance, the larger the standard deviation of the estimated coefficients, and the less 
believable the calculated impact of the explanatory variables. The significance test is based on a 
likelihood ratio test [Hosmer&Lemeshow, 1989] commonly used in the framework of logistic 
regression. 

Based on equation (1), the likelihood function of a data set of size D is: 

D 

L = n* x i) (3) 

i=l 

where: 

e (C c +Ci.X n + ... + C n »Xi n )»Yj 
= 1+e (C 0 +C 1 *Xn + ... + C n *X in ) (4) 
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where F; is assigned the value 1 if the class does not contain any fault, 0 otherwise. The n- 
dimensional vectors X[ contain the 00 design metrics characterizing each of the D observations. 
Also, Tt(Xi) represents the estimated probability for a class to contain (or not, depending on which 
is the case) a fault. The coefficients that will maximize the likelihood function will be the regression 
coefficient estimates. For mathematical convenience, l = Ln[L], the log-likelihood , is usually 
maximized. 


One of the global measure of goodness of fit we will use for logistic regression models is R 2 , a 
statistic defined as: 


R 2 = 


do - In) 
do - Is) 


where 

• lo is the log-likelihood function without using any covariate (just the intercept), 

• In is the log-likelihood of the model including the n selected design metrics as covariates, 

• Is is the log-likelihood of the saturated model , i.e., where Y h (0 or 1) is substituted for each 
probability 7Z(Xi) in l. The log-likelihood Is is the maximum value that can be assigned to l. 

The higher the R 2 , the more accurate the model. However, as opposed to the R 2 of least-square 
regression, high R 2 's are rare for logistic regression because In rarely approaches the value of Is 
since the computed 7Z(Xi)'s in In rarely approach 1. The interested reader may refer to 
[Hosmer&Lemeshow, 1989] for a detailed introduction to logistic regression. Finally, R 2 may be 
described as a measure of the proportion of total uncertainty that is attributed to the model fit. 

3.2.2 Univariate Analysis 

In this section, we analyze the six 00 metrics introduced in [Chidamber&Kemerer, 1994] (though 
slightly adapted to our context) with regard to the probability of fault detection in a class during test 
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phases. In our case, it is equivalent for the logistic model to calculate the probability of a single 

fault to be detected in a class. 

• Weighted Methods per Class (WMC) was shown to be somewhat significant (a = 0.06) 
overall. For new and extensively modified classes and for UI (Graphical and Textual User 
Interface) classes, the results are much better: a = 0.0003 and a = 0.001, respectively. As 
expected, the larger the WMC, the larger the probability of fault detection. These results can be 
explained by the fact that the internal complexity does not have a strong impact if the class is 
reused verbatim or with very slight modifications. In that case, the class interface properties 
will have the most significant impact. 

• Depth of Inheritance Tree of a class (DIT) was shown to be very significant (a = 0.0000) 
overall. As expected, the larger the DIT, the larger the probability of defect detection. Again, 
results improve (Logistic R 2 goes from 0.06 to 0.13) when only new and extensively modified 
classes are considered. 

• Response For a Class (RFC) was shown to be very significant overall (a = 0.0000). 
Predictably, the larger the RFC, the larger the probability of defect detection. However, the 
logistic R 2 improved significantly for new and extensively modified classes and UI classes 
(from 0.06 to 0.24 and 0.36, respectively). Reasons are believed to be the same as for WMC 
for extensively modified classes. In addition, UI classes show a distribution which is 
significantly different from that of DB classes: the mean and median are significantly higher. 
This, as a result, may strengthen the impact of RFC when performing the analysis. 

• Number Of Children of a Class (NOC) appeared to be very significant (except in the case of UI 
classes) but the observed trend is contrary to what was expected. The larger the NOC, the 
lower the probability of defect detection. This surprising trend can be explained by the 
combined facts that most classes do not have more than one child and that verbatim reused 
classes are somewhat associated with a large NOC. Since we have observed that reuse was a 
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significant factor in fault density [Melo et. al., 1995], this explains why large NOC classes are 
less fault-prone. Moreover, there is some instability across class subsets with respect to the 
impact of NOC on the probability of detecting a fault in a class (see A\]/’s in Table 2). This may 
be explained in part by the lack of variability on this measurement scale (see distributions in 
Figure 1). 

» Lack of Cohesion on Methods (LCOM) was shown to be insignificant in all cases (this is why 
the results are not shown in Table 2) and this should be expected since the distribution of 
LCOM shows a lack of variability and a few very large outliers. This stems in part from the 
definition of LCOM where the metric is set to 0 when the number of class pairs sharing 
variable instances is larger than that of the ones not sharing any instances. This definition is 
definitely not appropriate in our case since it sets cohesion to 0 for classes with very different 
cohesions and keeps us from analyzing the actual impact of cohesion based on our data sample. 

• Coupling Between Object classes (CBO) is significant and more particularly so for UI classes 
(a = 0.0000 and R 2 = 0.17). No satisfactory explanation could be found for differences in 
pattern between UI and DB classes. 

It is important to remember, when looking at the results in Table 2, that the various metrics have 
different units. Some of these units represent "big steps" on each respective measurement scale 
while others represent "smaller steps”. As a consequence, some coefficients show a very small 
impact (i.e., Axy's) when compared to others. This is not, however, a valid criterion to evaluate the 
predictive usefulness of such metrics. 

Most importantly, besides NOC, all metrics appear to have a very stable impact across various 
categories of classes (i.e., DB, UI, New-Ext, etc.). This is somewhat encouraging since it tells us 
that, in that respect, the various types of components are comparable. If we were considering 
different types of faults separately, results might be different. Such a refinement is, however, part 
of our future research plans. 
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Metrics 

Coefficient 

Avj / 

a 

R 2 

Classes 

WMC(l) 

-0.022 

98% 

0.0607 

0.007 

ALL 

WMC (2) 

-0.086 

92% 

0.00035 

0.024 

New-Ext 

WMC (3) 

-0.027 

103% 

0.0656 

0.0154 

DB 

WMC (4) 

-0.0944 

91% 

0.0019 

0.0467 

UI 

DIT(l) 

-0.485 

62% 

0.0000 

0.0648 

ALL 

DIT (2) 

-0.868 

42% 

0.0000 

0.1314 

New-Ext 

DIT (3) 

-0.475 

62% 

0.043 

0.0187 

DB 

DIT (4) 

-0.29 

75% 

0.024 

0.017 

UI 

RFC (1) 

-0.085 

92% 

0.0000 

0.0648 

ALL 

RFC (2) 

-0.087 

92% 

0.0000 

0.2477 

New-Ext 

RFC (3) 

-0.077 

93% 

0.0000 

0.188 

DB 

RFC (4) 

-0.108 

90% 

0.0000 

0.3624 

UI 

NOC(l> 

3.3848 

3000% 

0.0000 

0.1426 

ALL 

NOC (2) 

3.62 

3734% 

0.0011 

3.6235 

New-Ext 

NOC (3) 

2.05 

777% 

0.0000 

0.0826 

DB 

CBO(l) 

-0.142 

87% 

0.0000 

0.068 

ALL 

CBO (2) 

-0.079 

92% 

0.017 

0.02 

New-Ext 

CBO (3) 

-0.086 

92% 

0.006 

0.034 

DB 

CBO (4) 

-0.284 

75% 

0.0000 

0.17 

UI 


Table 2: Univariate Analysis - Summary of experimental results. 


3.2.3 Multivariate Analysis 

The OO design metrics presented in the previous section can be used early in the life cycle to build 
a predictive model of fault-prone classes. In order to obtain an optimal model, we included these 
metrics into a multivariate logistic regression model. However, only the metrics that significantly 
improve the predictive power of the multivariate model were included through a stepwise selection 
process. Another significant predictor of fault-proneness is the level of reuse of the class (called 
'‘origin” in Table 3). This information is available at the end of the design phase when reuse 
candidates have been identified in available libraries and the required amount of change can be 
estimated. Table 3 describes the computed multivariate model. Using such a model for 
classification, the results shown in Table 4 are obtained by using a classification threshold of 
p(Fault detection) = 0.5 for the probability of detecting a single defect in a given class, i.e., when 
p > 0.5, the class is classified as faulty and otherwise as non-faulty. As expected, classes 
predicted as faulty contain a large number of faults (250 faults on 48 classes) because those classes 
tend to show a better classification accuracy. 
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We now assess the impact of using such a prediction model by assuming, in order to simplify 
computations, that inspections of classes are 100% effective in finding faults. In that case, 80 
classes (predicted as faulty) out of 180 would be inspected and 48 faulty classes out of 58 would 
be identified before testing. If we now take into account individual faults, 250 faults out of 258 
would be detected during inspection. As mentioned above, such a good result stems from the fact 
that the prediction model is more accurate for multiple-faults classes. 



Coefficient 

a 

Intercept 

3.13 

0.0000 

DIT 

-0.50 

0.0004 

RFC 

-0.11 

0.0000 

NOC 

2.01 

0.0178 

RFC 

-0.13 

0.0072 

CBO 

-0.238 

0.0001 

Origin 

-1.84 

0.0000 


Table 3: Multivariate Analysis with OO design metrics 


Predicted 

Actual 

No fault 

Fault 

No Fault 

90 

32 

Fault 

10(18) 

48.(250) 


Table 4: Classification Results with OO Design Metrics. The figures before parentheses in the right 
column are the number of classes classified as faulty. The figures between the parentheses are the 
faults contained in those classes. 


In order to evaluate the predictive accuracy of these OO design metrics, it would be interesting to 
compare their predictive capability with the one of the usual code metrics, that can only be obtained 
later in the development life cycle. Three code metrics, among the ones provided by the Amadeus 
tool [Amadeus, 1994], were selected through a stepwise regression procedure. Table 5 shows the 
resulting parameter estimations of the multivariate logistic regression model where: MaxStatNext is 
the maximum level of statement nesting in a class, FunctDef is the number of function declarations, 
and FunctCall is the number of function calls. However, based on the whole set of metrics 
provided by Amadeus, other multivariate models yield results of similar accuracy. This model 
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happens to be, however, the model resulting from the use of a standard, stepwise logistic 
regression analysis procedure. 



Coefficient 

a 

Intercept 

0.39 

0.0384 

MaxStatNest 

-0.286 

0.0252 

FunctDef 

0.166 

0.0010 

FunctCall 

-0.0277 



Table 5: Multivariate Analysis with Code Metrics 


In addition to being collectable only later in the process, code metrics appear to be somewhat 
poorer as predictors of class fault-proneness (see Table 6). In this case, 1 12 classes (predicted as 
faulty) out of 180 would be inspected and 5 1 faulty classes out of 58 would be detected. If we now 
take into account individual faults, 23 1 faults out of 268 would be detected during inspection. 
Three more faulty classes would be conrected (5 1 versus 48) but 32 more classes would have to be 
inspected (112 versus 80). Moreover, the 00 design metrics are better predictors of classes 
containing large numbers of faults since 19 more faults (250 versus 231) would be detected in that 
case. Therefore, predictions based on code metrics appear to be poorer. Table 7 confirms that 
result by showing the values of correctness (percentage of classes correctly predicted as faulty) and 
completeness (percentage of faulty classes detected). Values between parentheses present 
predictions' correctness and completeness values when classes are weighted according to the 
number of faults they contain (classes with no fault are weighted 1). 


Predicted 

Actual 

No fault 

Fault 

Model 

Accuracy 

OO 

metrics 

Code 

metrics 

No Fault 

61 

61 

Completeness 

88% (93%) 

83% (86%) 

Fault 

7 (37) 

51 (231) 

Correctness 

60% (92%) 

45.5% (86%) 


Table 6: Classification Results based on code Table 7: Classification Accuracies based on 

metrics shown in Table 5 OO and code metrics shown in Table 3 and 

Table 5 
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4 . 


Related Work 


As far as we know, the only studies attempting to experimentally validate OO metrics are 
[Lie&Henry, 1993] and [Briand et. al., 1994], In [Briand et. al. ,1994], metrics for measuring 
abstract data type (ADT) cohesion and coupling are proposed and are experimentally validated as 
predictors of faulty ADT's. Further work will consist of verifying that the metrics proposed by 
[Briand et. al. ,1994] are also applicable to C++ programs, in a context of inheritance. 

To the knowledge of the authors, [Lie&Henry, 1993] is the only study which can really be 
compared to the work we describe in this paper. Li and Henry have proposed a suite of OO design 
metrics. They validated this suite of metrics by studying the number of changes performed in two 
commercial systems implemented with an OO dialect of Ada. The suite of OO design metrics used 
by Li and Hemy extends Chidamber&Kemerer’s OO metrics with two additional metrics: 

• Message Passing Coupling (MPC) which is calculated as the number of send statements 
defined in a class. 

• Data Abstraction Coupling (DAC) which is calculated as the number of abstract data types used 
in the measured class and defined in another class of the system. 

They combined the six Chidamber&Kemerer’s OO metrics with these last two metrics in a single 
least-square regression model. According to the authors, their model was adequate in predicting the 
size of changes in classes during the maintenance phase. They did not, however, look at the time 
spent changing a class nor the cause of changes (e.g., corrections, enhancement, etc.). In addition, 
they assumed that the number of modifications in a class is proportional to the effort spent to 
change it, which is not necessarily true. Also, we do not believe that the number of changes can be 
considered as a measure of maintainability since it is not dependent on the modifiability of a class 
but on the correctness and functional stability of the class. 
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In this study, we did not consider DAC and MPC because they could not be directly applied in our 
experimental context (C++ does not provide send statements). Based on the way DAC was 
defined by Lie&Henry, it cannot be directly used for C++. DAC could, however, be 
redefined/tailored to our needs, providing another way to calculate coupling across C++ classes. 
This is, however, beyond of the scope of this paper. 

An important difference in our work is that we have used the occurrence of faults in a class to 
verify whether Chidamber&Kemerer’s OO metrics were adequate quality predictors. Of course, 
many other quality measures of interest could be used in this context, e.g., change productivity. 
Last, the modeling technique we used (i.e., logistic regression) to predict fault-prone classes is 
different because of the nature of the dependent variable which is binary in our case. This has led 
us to use a classification technique. 

5. Conclusions and further work 

In this experiment, we collected data about defects found in Object-Oriented classes. Based on 
these data we verified experimentally how much fault-proneness is influenced by internal (e.g., 
size, cohesion) and external (e.g., coupling) design characteristics of OO classes. From the results 
presented above, several of Chidamber&Kemerer’s OO metrics appear to be adequate to predict 
class fault-proneness during the early phases of the life-cycle. We also showed that 
Chidamber&Kemerer’s OO metrics are better predictors than “traditional” code metrics on our data 
set, which, in addition, can only be collected at a later phase of the software development 
processes. 

Our future work includes: 

• replicating this study in an industrial setting: a sample of large-scale projects developed in C++ 
and Ada95 in the framework of the NASA Goddard Flight Dynamics Division (Software 
Engineering Laboratory). This work should help us better understand the prediction capabilities 
of the suite of OO metrics described in this paper. By doing that, we intend to: 
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° build models and provide guidance to improve the allocation of resources with 
respect to test and verification efforts, 

0 gain a better understanding of the impact of 00 design strategies (e.g., simple 
versus multiple inheritance) on defect density and rework. In this study, because of 
an inadequate data collection process, we were unable to analyze the capability of 
OO design metrics to predict rework. We believe that this drawback could be 
overcome by refining our data collection process in order to capture how much 
effort was spent on each class individually. 

• analyzing 00 libraries in order to identify “good” and “bad” OO design patterns. Design 
patterns have been claimed to be a way to improve reuse and quality of OO software systems 
[Gamma et. al, 1995]. We intend to use the approach described in this paper to assess 
organization-specific design patterns, thus providing guidelines about what 00 design patterns 
should be encouraged and which ones should be avoided due to their fault-proneness or their 
lack of maintainability. 

• studying the variations, in terms of metric definitions and experimental results, between 
different OO programming languages. The fault-proneness prediction capabilities of the suite of 
OO metrics discussed in this paper can be different depending on the used programming 
language. Work must be undertaken to validate this suite of 00 design metrics across different 
OO languages, e.g., Ada95, Smalltalk, Eifeil, C++, etc. 

• extending the experimental investigation to other OO metrics proposed in the literature (e.g., 
[Abreu&Carapupa, 1994]) and develop new metrics, e.g., more language specific. 
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For the past five years, the Flight Dynamics Division 
(FDD) at NASA’s Goddard Space Flight Center has 
been carrying out a detailed domain analysis effort and 
is now beginning to implement Generalized Support 
Software (GSS) based on this analysis. GSS is part of 
the larger Flight Dynamics Distributed System 
(FDDS), and is designed to rim under the FDDS User 
Interface / Executive (UIX). The FDD is transitioning 
from a mainframe based environment to FDDS based 
systems running on engineering workstations The 
GSS will be a library of highly reusable components 
that may be configured within the standard FDDS 
architecture to quickly produce low-cost satellite 
ground support systems. The estimates for the first 
release is that this library will contain approximately 
200,000 lines of code. 

The main driver for developing generalized 
software is development cost and schedule 
improvement. The goal is to ultimately have at least 
80 percent of all software required for a spacecraft 
mission (within the domain supported by the GSS) to 
be configured from the generalized components. 


Domain Analysis 

The GSS domain analysis effort originally grew out of 
a study of the feasibility of generalizing the attitude 
ground support systems (AGSSs) produced by the 
FDD for various spacecraft missions. FDD software 
tends to be similar from mission to mission. An AGSS 
is used to determine the orientation of a spacecraft 
from on-board sensor data and to compute maneuvers 
to change that orientation. It typically has several 
executable programs that are used for specialized areas 
such as attitude estimation and sensor calibration. 
These programs share models to varying degrees. For 
example, just about every FDD system has an orbit 
propagator in it. Part of the domain analysis effort is 
intended to reduce overlap and redundancy between 
systems. 

As part of an ambitious project to re-engineer a 
majority of the FDD software systems, the domain 
covered by the analysis was later expanded to also 
include a number of mission analysis and planning 
functions. Indeed, at one point plans called for this 
project to eventually encompass all FDD 
functionality, adding orbit models to the attitude and 
mission planning functionality. 
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Project History 

The domain analysis effort began by studying the 
functional specifications of existing AGSSs. These 
specifications used data flow diagrams, so it was 
natural to adopt this technique for the generalized 
domain model. However, the limitations of this 
approach soon became apparent, especially in the lack 
of classification techniques crucial to capturing 
generalizations. Despite the fact that most of the 
people working on the effort were not particularly 
familiar with object-oriented approaches, a consensus 
developed that object-oriented analysis would be a 
better technique than data-flow diagrams for our 
puiposes. Following this decision, we developed a 
Specification Concepts document [Seidewitz 91] that 
captured the object-oriented analysis approach used in 
subsequent analysis. 

Unfortunately, budgetary pressures prevented the 
ambitions re-engineering plans from becoming reality. 
Further, the expanding scope of the analysis effort 
became incijeasingly difficult to handle. Thus, the 
domain analysis effort was refocused generally to 
concentrate once again on the attitude support domain. 
The end effect was that the domain analysis team did 
not increase as planned, leaving a small team to do the 
analysis over several years. The effort has specifically 
proceeded to focus in detail on the analysis of the first 
two GSS releases: telemetry simulation and real-time 
attitude determination. We have now completed two 
versions of the generalized specifications for the first 
release [Klitsch 93] and work is proceeding on the 
specifications for the second release [Klitsch 94], 

Specification Concepts 

The specification products of the domain analysis 
effort are all based on our standard specification 
concepts. Actually, these specification concepts have 
continued to evolve based on our analysis experiences 
[Seidewitz 93]. Throughout this process there has been 
a continual tension between keeping the concepts as 
simple as possible and assuring that they are powerful 
enough to allow specification of domain functionality 
without undue complication. The core concepts of the 


model include the basic object-oriented principles of 
classes, objects and messages. Additional concepts 
have been added to this core only when not including 
the new concept would make it difficult or impossible 
to clearly specify some specific domain functionality 
under consideration. 

For example, we have used only two levels of 
classification of objects. Each specific object class 
belongs to exactly one superclass that represents a 
general domain category (e.g., a Sun Sensor would be 
in the Sensor category). Further, superclasses only 
specify common interfaces, not common functionality, 
so there is no inheritance of functionality by 
subclasses. This restricted approach has allowed us to 
cleanly and simply introduce the required 
generalization concepts while maintaining the locality 
of specification of the functionality of any class. The 
approach worked well through the first versions of our 
specifications. However, current work is indicating an 
increasing number of opportunities where deeper 
classification hierarchies would be useful, and we may 
add this to our concepts. 

Another restriction in our concepts is that objects 
are not dynamically created or destroyed. Instead!, 
objects and their interdependencies are specified as 
part of the configuration of an application. Once these 
objects are created, they exist for the duration of the 
execution of the application. Data passed between 
objects is not itself object-oriented, but is instead 
drawn from a set of standard data types ( Integer , Real, 
Vector, Matrix, etc.). This approach provides us with a 
clear definition of configuration, which was a topic of 
many long discussions. The resolution of these 
discussions was that the generalized specifications deal 
exclusively with the definition of classes, while the 
configuration specifications deal exclusively with the 
definition of the objects in an application. This 
philosophy also provides a fundamental connection to 
our implementation approach. 

Besides restrictions in using object-oriented 
concepts, the specification concepts evolved to 
eliminate unnecessary and sometimes complex 
concepts. For example, the original concepts called 
for modeling separate subsystems that only 
communicate via data objects. These subsystems were 
intended to be configured as separate executable 
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programs. This made it hard to specify models (such as 
estimation algorithms) that are usable in more than one 
subsystem (such as attitude determination and sensor 
calibration). The solution was to create a single 
domain map, and replace the subsystem driver with 
application categories that provide the same set of 
actions to the UIX. These application categories also 
map to separately configured programs, but can draw 
on classes throughout the domain map, instead of 
classes contained in a single subsystem. 

Lessons Learned 

The current specifications are defined with more detail 
and less ambiguity than the typical FDD specification 
documents. This has had a positive impact on the 
development process, since class specifications are 
generally detailed enough to serve as PDL. However, 
these specifications are harder for the analyst to 
understand when specifying the configuration of an 
application program for a given satellite. The 
generalized specification document is currently weak 
at showing how an entire application would behave. 
One reason is that the specification effort has focused 
the limited resources on producing class specifications 
to implement, at the expense of producing information 
that the analysts would use when defining a 
configuration. 

A more important reason is that FDD attitude and 
orbit analysts don't think in terms of objects, but in 
terms of algorithms such as a Kalman Filter estimation 
algorithm. The concept of this algorithm can be 
expressed to the mathematician in 5 or 6 equations. To 
understand the GSS specification, the analyst needs to 
understand how several classes contribute to the 
processing needed to implement these equations. The 
specification concepts need to be updated to improve 
the description of how classes interact to support 
algorithm. Part of the answer is to complete the 
intended documentation for each subdomain (major 
group of categories) to explain these interactions. The 
concept of "scenarios" or "use cases" (as discussed by 
e.g. [Jacobson 92] ) may be appropriate for describing 
the overall behavior of an application. 

Another key lesson for domain analysis is that 
developers need to be involved in the process. This is 


primarily because the class specifications are written at 
a level of detail that often raises implementation issues 
such as performance. The GSS project has always had 
developer involvement in the domain analysis process. 
This process may be improved by increasing this 
involvement, perhaps even evolving towards a joint 
analysis / development team. This is because as more 
classes are implemented the developers have a greater 
stake in making sure that new analysis work won’t 
have any negative effects on die existing class library. 

Development 

The creation of a generalized design is made possible 
by the standardization of class specifications in the 
Specification Concept document, and by the 
standardization of the interaction between the UIX and 
the GSS application [Booth 93]. The UIX drives 
application processing by calls to three operations 
provided by the application. These operations allow 
the user to access and modify operations, or to execute 
the next action in the application. The application may 
also send messages to the UIX. 

The key feature of a GSS application is that it is 
built from a library of classes, and can then be 
configured at run time. The run time configuration 
process includes allocating the objects for each class, 
setting the specific dependencies between objects (the 
generalized specifications define dependencies 
between classes, which are implemented at compile 
time using the Ada generic parameters), and setting 
default parameter values. 

Implementation 

The classes in our generalized specifications are 
implemented as a set of two Ada packages. A class 
package implements an abstract data type representing 
the class, and an object manager package contains all 
the objects for a given class. These classes are 
arranged in a hierarchy with category packages 
implementing the interface for a specified category, 
and the Application Interface package implementing a 
root object that dispatches to categories die operations 
to allocate objects, set dependencies and interact with 
parameters (instance variables). The bodies of the 
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category packages and the Application Interface 
package implement only dispatching code. All the 
functionality resides in class and object manager 
packages. 

Ada was chosen as a development language for 
two reasons. The organizational reason at the 
beginning of the GSS project the division had 
experience with several Ada simulators, C++ was not 
considered mature technology by division 
management, and no other language met the need for 
object orientation, support on a wide variety of 
platforms, and a core of experienced developers in the 
FDD. The technical reason was the use of generics to 
add flexibility to the configuration process. 

The GSS generic packages use both types 
(defining the class or category depended on) and 
subprograms (defining messages sent to the class or 
category depended on). The configuration process 
consists of instantiating the generics to set the 
dependencies between classes_and categories and 
calling dependency operations to set the actual 
connections between objects. The use of generics 
allows categories dependencies to be satisfied by 
classes, bypassing the dispatching code when it is not 
needed. This fact was important in addressing user 
concerns that the overhead of dispatching code would 
hurt run time performance. A class can actually be 
instantiated using any class that provides the 
operations that are needed to match the generic 
parameters. 

Code Generation (Classgen) 

The code for the allocation, dependency and parameter 
operations is similar in structure from class to class, 
but each of these operations depends on the 
specification of the particular class. This means that 
the implementation code can not be written at the root 
of the classification tree, but that there is still a lot of 
tedious repetition to the coding of classes. The 
development team's solution was to write a code 
generator (named Classgen) that reads in a concise 
notation describing class functions, dependencies and 
parameters. The output of the code generator is the 
implementation of all the functions specified at the 
Application Interface level, plus subprogram interfaces 


and stubs for the constructors and selectors defined in 
the specification document. This was made possible 
by the existence of a generalized design that mapped 
standard specification features into the Ada 
implementation. 

The input language for Classgen also has features 
corresponding to those defined in the specification 
concepts, and adds design features such as the error 
handling. The tool generates a type definition for a 
class that contains all the parameters, internal data, and 
dependencies defined for the class, implementation of 
stubs for the functions in the specification, and 
implementation of the subprograms needed for 
allocating objects, setting dependencies, and accessing 
or modifying parameter values. This code is about 
75% of the code needed to implement a class, with one 
line of Classgen input generating about 10 lines of Ada 
code. Classgen generates all the code that can be 
generated based on the standardized specifications and 
design. The remaining code is the implementation of 
the functions specified for the class. 

Classgen Lessons 

Having a code generator has saved time and effort on 
the GSS project, but it has taken time for the tool to 
mature. The main reason for this is that the initial 
concept was for Classgen to be run once per class to 
generate the code, and editing the created files after 
that. In practice it was necessary to edit the 
regenerated code, both because the generalized design 
evolved and required changes and because the 
developers used the tool to regenerate files if there 
were substantial modifications to a class. The problem 
was that the original version of Classgen required the 
developer to edit most of the files generated for a class. 
A notation was defined to mark these changes, but 
regenerating the class meant having to merge these 
changes into the new file. Classgen has been modified 
in stages so that in most cases the only file a developer 
edits is a separately compiled "subunit" file in which 
the specified functions are implemented. Changes to 
the other files still occur, but they are rare enough and 
small enough that they don't have a major impact. 

These changes were generally made by extending 
the Classgen grammar, but in some cases the 
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generalized design was modified to facilitate code 
generation. A simple example of this was to move all 
"with" clauses (which define dependencies between 
Ada packages) into package specifications, and having 
all utility packages imported into a class be "with"ed 
into the Classgen input as well. This sometimes makes 
packages visible in a larger amount of code than 
strictly necessary, but it captures the design 
information in the Classgen input and removes the 
need to edit files to add the importing of packages. 

Process Lessons 

The use of standardized, object-oriented specification 
concepts has had several effects on development. We 
have already noted that the specifications are complete 
enough to serve as PDL. The specification of 
dependencies between classes, together with the 
generalized design for dependencies, completely 
captures the system structure typically defined in 
preliminary design. The development of a build 
typically starts with detailed design of classes, which is 
expressed in terms of changes to the specification. 
Given this shift of "design" work to the domain 
analysis team, a joint "domain analysis and design" 
team may be justified. This is particularly true once 
the class library is populated and changes to the 
domain may have major effects on the existing code. 

Using object-oriented specifications will enable 
incremental development. However, the flight 
dynamics domain is one where a substantial number of 
core classes (integrators, dynamic models, 
environmental models,...) are needed before anything 
useful can be done. The builds are still being done 
incrementally, but a system that is testable by the end 
user won't be available until the third build is 
complete. The good news is that once the first 
application is complete, added capabilities can be 
created in single builds. For example, the first two 
releases of GSS will be delivering components to 
support simulation and real-time attitude estimation in 
a total of 5 builds. Adding the generalized 
components for non-real time estimation and for 
sensor calibration will take one or two additional 
builds. Similar scale builds can be used to add new 
models to the existing categories, or to expand into the 


orbit or maneuver planning areas. Thus "design a 
little, code a little, test a little" will work for GSS, but 
only after a base of core classes has been implemented. 

The integration of these generalized classes has 
been easier than for typical projects. This is another 
benefit of having standard object-oriented 
specifications that clearly define internal and external 
interfaces, and a generalized design that standardizes 
the implementation of dependencies between classes. 
Together these factors assure that if a class depends on 
a given operation from another class that class will 
provide the operation and the two classes will interface 
correctly. 

Summary 

The lessons described above have been learned 
during the specification and the early development of 
the GSS project. These lessons will be applied^Jo 
further specification and development work. The 
initial releases will be complete by the end of 1995, at 
which point the FDD will start seeing return on the 
investment in this project. 
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